教程 > Pandas 教程 > Pandas 数据结构阅读：381

Python Pandas DataFrame

Pandas DataFrame是一种二维数据结构，数据在行和列中以表格方式对齐，可以认为就是一个表格。

Pandas DataFrame数据结构图

在Pandas数据结构一节，我们说过在Pandas中高维数据结构是其低维数据结构的容器。比如 DataFrame 是 Series 的容器。所以可以认为 DataFrame数据是由多个Series组成的

Pandas series组成DataFrame数据结构

DataFrame 的特点

列之间可以是不同的类型
数据结构大小可变
行和列都可以使用标签进行索引
可以对行和列进行算术运算

Pandas DataFrame

可以使用以下方法创建一个 DataFrame 数据

pandas.DataFrame( data, index, dtype, copy)

参数说明如下：

data：一组数据(ndarray、series, map, lists, dict 等类型)。
index：索引值，或者可以称为行标签。
columns：列标签，默认为 RangeIndex (0, 1, 2, …, n) 。
dtype：数据类型。
copy：拷贝数据，默认为 False。

创建 DataFrame

可以使用各种数据类型作为输入创建Pandas DataFrame

列表
字典
Series
Numpy 数组
DataFrame

在本章的后续部分中，我们将看到如何使用这些输入创建 DataFrame。

创建一个空的 DataFrame

可以创建的最基本的 DataFrame 是空 DataFrame。

import pandas as pd
df = pd.DataFrame()
print(df)

运行示例

运行结果如下

Empty DataFrame
Columns: []
Index: []

使用列表创建 DataFrame

可以使用单个列表或一组列表创建 DataFrame。

import pandas as pd

#单个列表创建DataFrame
data = [1,2,3,4,5]
single_df = pd.DataFrame(data)
print(single_df)

#一组列表创建 DataFrame
data = [['Alex',10],['Bob',12],['Clarke',13]]
group_df = pd.DataFrame(data,columns=['Name','Age'])
print(group_df)

运行示例

运行结果如下

   0
0  1
1  2
2  3
3  4
4  5
     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13

接下来我们使用一组列表创建DataFrame 并且指定 dtype

import pandas as pd
data = [['Alex',10],['Bob',12],['Jiyik',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)

运行示例

运行结果如下

    Name Age
0   Alex  10.0
1    Bob  12.0
2  Jiyik  13.0

注意- dtype参数将 Age 列的类型更改为浮点数。但是在Python3中对float类型有警告。

使用ndarrays/Lists 的字典创建一个 DataFrame

所有ndarray必须具有相同的长度。如果传递索引，则索引的长度应等于数组的长度。

如果未传递索引，则默认情况下，索引将为 range(n)，其中n是数组长度。

import pandas as pd

#不指定索引
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)

#指定索引
index_df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(index_df)

运行示例

运行结果如下

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   29
rank4  Ricky   42

使用字典列表创建一个 DataFrame

字典列表可以作为输入数据创建一个 DataFrame。默认情况下，字典键作为列名。

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#不指定索引
df = pd.DataFrame(data)
print(df)

#指定行索引
row_index_df = pd.DataFrame(data, index=['first', 'second'])
print(row_index_df)

#指定列行索引和列索引
column_index_df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
print(column_index_df1)

#指定列行索引和列索引 不存在的列则默认使用 NaN
column_index_df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b','c1'])
print(column_index_df2)

运行示例

运行结果如下

   a   b     c
0  1   2   NaN
1  5  10  20.0
        a   b     c
first   1   2   NaN
second  5  10  20.0
        a   b
first   1   2
second  5  10
        a   b  c1
first   1   2 NaN
second  5  10 NaN

注意- 观察，column_index_df2 DataFrame 是使用字典键以外的列索引 c1 创建的；因此，该列显示 NaN 。

从 Series 字典创建数据帧

可以传递 Series 字典来创建 DataFrame。结果索引是所有传递的 Series 索引的并集。

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

运行示例

运行结果如下

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4

注意- 对于Series one，没有标签“d”，但在结果中，对于d标签，则只能显示NaN。

访问列

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df ['one'])

运行示例

运行结果如下

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

添加列

我们将通过向现有 DataFrame 添加新列来理解该知识点。

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

#向现有的DataFrame添加一个新列

print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)

print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print(df)

运行示例

运行结果如下

Adding a new column by passing as Series:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Adding a new column using the existing columns in DataFrame:
   one  two  three  four
a  1.0    1   10.0  11.0
b  2.0    2   20.0  22.0
c  3.0    3   30.0  33.0
d  NaN    4    NaN   NaN

删除列

可以从DataFrame中删除或弹出列；让我们举一个例子。

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
   'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print(df)

#删除
print ("Deleting the first column using DEL function:")
del df['one']
print(df)

#弹出
print ("Deleting another column using POP function:")
df.pop('two')
print(df)

运行示例

运行结果如下

Our dataframe is:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Deleting the first column using DEL function:
   two  three
a    1   10.0
b    2   20.0
c    3   30.0
d    4    NaN
Deleting another column using POP function:
   three
a   10.0
b   20.0
c   30.0
d    NaN

行的访问、添加和删除

我们现在将通过示例了解行检索、添加和删除。

检索行

行的检索可以通过标签、位置和切片的方式来进行

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
#标签检索
print("Access By Label:")
print(df.loc['b'])

#位置索引
print("Access By Position:")
print(df.iloc[2])

#切片方式检索
print("Access By Slicing:")
print(df[2:4])

运行示例

运行结果如下

Access By Label:
one    2.0
two    2.0
Name: b, dtype: float64
Access By Position:
one    3.0
two    3.0
Name: c, dtype: float64
Access By Slicing:
   one  two
c  3.0    3
d  NaN    4

添加行

使用append函数向 DataFrame 添加新行。此函数将在末尾追加行。

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)

运行示例

运行结果如下

删除行

使用索引标签从 DataFrame 中删除行。如果标签重复，则将删除多行。

在上面的示例中，标签是重复的。让我们删除一个标签，看看有多少行会被删除。

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

df = df.drop(0)

print(df)

运行示例

运行结果如下

   a  b
1  3  4
1  7  8

在上面的示例中，删除了两行，因为这两行包含相同的标签 0。

 Python Pandas Panel 面板

Python Pandas Series 

迹忆客计算机编程教程

Python Pandas DataFrame

DataFrame 的特点

Pandas DataFrame

创建 DataFrame

创建一个空的 DataFrame

使用列表创建 DataFrame

使用ndarrays/Lists 的字典创建一个 DataFrame

使用字典列表创建一个 DataFrame

从 Series 字典创建数据帧

访问列

添加列

删除列

行的访问、添加和删除

检索行

添加行

删除行

查看笔记

Python Pandas DataFrame

DataFrame 的特点

Pandas DataFrame

创建 DataFrame

创建一个空的 DataFrame

使用 列表 创建 DataFrame

使用ndarrays/Lists 的字典创建一个 DataFrame

使用字典列表创建一个 DataFrame

从 Series 字典创建数据帧

访问 列

添加列

删除列

行的访问、添加和删除

检索行

添加行

删除行

 查看笔记

使用列表创建 DataFrame

访问列

查看笔记