Pandas DataFrame DataFrame.interpolate()函数

当前位置：主页 > 学无止境 > 编程语言 > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas DataFrame DataFrame.interpolate()函数

作者：迹忆客最近更新：2024/04/22 浏览次数：

Python Pandas DataFrame.interpolate() 函数使用插值技术在 DataFrame 中填充 NaN 值。

`pandas.DataFrame.interpolate()` 语法

DataFrame.interpolate(
    method="linear",
    axis=0,
    limit=None,
    inplace=False,
    limit_direction="forward",
    limit_area=None,
    downcast=None,
    **kwargs
)

参数


`method`	`linear`, `time`, `index`, `values`, `nearest`, `zero`, `slinear`, `quadratic`, `cubic`, `barycentric`, `krogh`, `polynomial`, `spline`, `piecewise_polynomial`, `from_derivatives`, `pchip`, `akima` 或 `None`。用于插值 `NaN` 的方法。
`axis`	沿行(`axis=0`)或列(`axis=1`)插补缺失的数值
`limit`	要内插的最大连续 `NaN` 数
`inplace`	布尔型。如果 `True`，就地修改调用方 `DataFrame`。
`limit_direction`	`forward`, `backward` 或 `both`。当指定 `limit` 时，将沿 `NaNs` 的 `Direction` 进行插值。
`limit_area`	`None`, `inside` 或 `outside`。当指定 `limit` 时，对插值的限制。
`downcast`	字典。指定向下转换数据类型
`**kwargs`	插值函数的关键字

返回值

如果 inplace 为 True，则使用给定的 method 对所有 NaN 值进行内插的 DataFrame；否则为 None。

示例代码：用 `DataFrame.interpolate()` 方法对 `DataFrame` 中所有 `NaN` 值进行内插

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate()

print("Interploated DataFrame:")
print(filled_df)

输出：

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X    Y
0  1.0  4.0
1  2.0  6.0
2  3.0  8.0
3  3.0  5.5
4  3.0  3.0

它使用 linear 插值方法对 DataFrame 中的所有 NaN 值进行内插。

该方法与 pandas.DataFrame.fillna() 相比更加智能，后者使用一个固定的值来替换 DataFrame. 中的所有 NaN 值。

示例代码：`DataFrame.interpolate()` 方法用 `method` 参数

我们也可以在 DataFrame.interpolate() 函数中设置 method 参数值，用不同的插值技术对 DataFrame 中的 NaN 值进行插值。

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(method='polynomial', order=2)

print("Interploated DataFrame:")
print(filled_df)

输出：

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
          X      Y
0  1.000000  4.000
1  2.000000  7.125
2  3.000000  8.000
3  3.368421  6.625
4  3.000000  3.000

该方法使用二阶多项式插值方法对 DataFrame 中的所有 NaN 值进行插值。

这里，order=2 是 polynomial 函数的关键字参数。

示例代码：Pandas `DataFrame.interpolate()` 方法使用 `axis` 参数沿 `row` 轴进行插值

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(axis=1)

print("Interploated DataFrame:")
print(filled_df)

输出：

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X    Y
0  1.0  4.0
1  2.0  2.0
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0

这里，我们设置 axis=1，以沿行轴插值 NaN 值。在第 2 行，NaN 值被沿第 2 行线性内插替换。

但是，在第 4 行中，由于第 4 行中的两个值都是 NaN，所以即使在内插后，NaN 值仍然存在。

示例代码：`DataFrame.interpolate()` 方法带 `limit` 参数

DataFrame.interpolate() 方法中的 limit 参数限制了该方法所要填充的连续 NaN 值的最大数量。

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, None, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate( limit = 1)

print("Interploated DataFrame:")
print(filled_df)

输出：

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  NaN
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X     Y
0  1.0  4.00
1  2.0  3.75
2  3.0   NaN
3  3.0   NaN
4  3.0  3.00

在这里，当一列中的一个 NaN 值从上到下被填满后，同一列中下一个连续的 NaN 值将保持不变。

示例代码：`DataFrame.interpolate()` 方法带 `limit_direction` 参数的方法

DataFrame.interpolate() 方法中的 limit-direction 参数控制沿着特定轴的方向，在这个方向上进行数值插值。

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, None, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(limit_direction ='backward', limit = 1)

print("Interploated DataFrame:")
print(filled_df)

输出：

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  NaN
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X     Y
0  1.0  4.00
1  2.0   NaN
2  3.0   NaN
3  3.0  3.25
4  3.0  3.00

在这里，当一列中的 NaN 从底部填入后，同一列中下一个连续的 NaN 值将保持不变。

用 `DataFrame.interpolate()` 方法对时间序列数据进行内插

import pandas as pd

dates=['April-10', 'April-11', 'April-12', 'April-13']
fruits=['Apple', 'Papaya', 'Banana', 'Mango']
prices=[3, None, 2, 4]

df = pd.DataFrame({'Date':dates ,
                   'Fruit':fruits ,
                   'Price': prices})

print(df)
df.interpolate(inplace=True)

print("Interploated DataFrame:")
print(df)

输出：

       Date   Fruit  Price
0  April-10   Apple    3.0
1  April-11  Papaya    NaN
2  April-12  Banana    2.0
3  April-13   Mango    4.0
Interploated DataFrame:
       Date   Fruit  Price
0  April-10   Apple    3.0
1  April-11  Papaya    2.5
2  April-12  Banana    2.0
3  April-13   Mango    4.0

由于 inplace=True，在调用 interpolate() 函数后，原 DataFrame 被修改。

上一篇：Pandas DataFrame DataFrame.drop_duplicates() 函数

下一篇：Pandas DataFrame DataFrame.merge() 函数

转载请发邮件至 1244347461@qq.com 进行申请，经作者同意之后，转载请以链接形式注明出处

本文地址：

Pandas DataFrame DataFrame.shift() 函数

发布时间：2024/04/24 浏览次数：133 分类：Python

DataFrame.shift() 函数是将 DataFrame 的索引按指定的周期数进行移位。

Pandas pandas.melt() 函数

发布时间：2024/04/24 浏览次数：101 分类：Python

pandas.melt()函数可以转换 DataFrame。

Python pandas.pivot_table() 函数

发布时间：2024/04/24 浏览次数：82 分类：Python

Python Pandas pivot_table()函数通过对数据进行汇总，避免了数据的重复。

Pandas read_csv()函数

发布时间：2024/04/24 浏览次数：254 分类：Python

Pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 DataFrame 中。

Pandas 追加数据到 CSV 中

发布时间：2024/04/24 浏览次数：352 分类：Python

本教程演示了如何在追加模式下使用 to_csv()向现有的 CSV 文件添加数据。

Pandas 多列合并

发布时间：2024/04/24 浏览次数：628 分类：Python

本教程介绍了如何在 Pandas 中使用 DataFrame.merge()方法合并两个 DataFrames。

用多个条件过滤 Pandas DataFrame

发布时间：2024/04/24 浏览次数：649 分类：Python

本教程解释了如何根据多个条件从 DataFrame 中过滤元素。

Pandas loc vs iloc

发布时间：2024/04/24 浏览次数：837 分类：Python

本教程介绍了如何使用 Python 中的 loc 和 iloc 从 Pandas DataFrame 中过滤数据。

在 Python 中将 Pandas 系列的日期时间转换为字符串

发布时间：2024/04/24 浏览次数：894 分类：Python

了解如何在 Python 中将 Pandas 系列日期时间转换为字符串

迹忆客专注技术分享