Pandas DataFrame DataFrame.sample() 函数

当前位置：主页 > 学无止境 > 编程语言 > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas DataFrame DataFrame.sample() 函数

作者：迹忆客最近更新：2024/04/22 浏览次数：

Python Pandas DataFrame.sample() 函数从一个 DataFrame 中随机生成一行或一列的样本。样本可以包含多行或多列。

`pandas.DataFrame.sample()` 语法

DataFrame.sample(
    n=None, frac=None, replace=False, weights=None, random_state=None, axis=None
)

参数


`n`	它是一个整数值。它代表要从 `DataFrame` 中选择的行或列的随机数
`frac`	它是一个浮点数值。它指定了要从 `DataFrame` 中提取的随机行或列的百分比。例如，`frac=0.45` 意味着选择的随机行或列将是原始数据的 45%
`replace`	它是一个布尔值。如果它被设置为 `True`，那么它将返回替换数据的样本
`weights`	它是一个字符串或一个 N 维的数组结构。如果在 `DataFrame` 上调用它，那么当轴为 0 时，它接受一列的名称，权重列中数值较大的行更有可能作为样本数据返回
`random_state`	它是一个整数或 `numpy.random.RandomState` 函数。如果它是一个整数，那么它在每次迭代中返回相同数量的行或列。否则，它返回一个 `numpy RandomState` 对象
`axis`	它是一个整数或字符串。它告诉目标轴的行或列。它可以是 0 或 `index`，1 或 `columns`

返回值

它返回一个 Series 或 DataFrame。返回的 Series 或 DataFrame 是一个调用器，包含从原始 DataFrame 中随机选择的 n 个元素。

示例代码：`DataFrame.sample()`

默认情况下，函数返回一个包含行的样本，即 axis=1。

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 75, 4: 95},
                    'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                    'Obtained Marks': {0: 56, 1: 75, 2: 82, 3: 64, 4: 67}})
print(dataframe)

我们的 DataFrame 为，

   Attendance    Name  Obtained Marks
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
3          75     Ben              64
4          95   Kevin              67

这个函数的所有参数都是可选的。如果我们在执行这个函数时没有传递任何参数，它将返回一个随机的行作为输出。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample()
print(dataframe1)

输出 1：

   Attendance Name  Obtained Marks
3          75  Ben              64

输出 2：

   Attendance   Name  Obtained Marks
4          95  Kevin              67

输出 1 和输出 2 显示了同一个程序的两次执行情况。每次这个函数都会从给定的 DataFrame 中产生一个随机的行样本。

示例代码：`DataFrame.sample()` 提取列

要在样本中生成列，我们将简单地把我们的轴改为 1。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=1, axis=1)
print(dataframe1)

输出：

     Name
0  Olivia
1    John
2   Laura
3     Ben
4   Kevin

该函数已经生成了一个单一列的样本作为输出。列的数量由参数 n=1 设置。

示例代码：`DataFrame.sample()` 生成数据的一部分

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=0.5)
print(dataframe1)

输出：

   Attendance   Name  Obtained Marks
3          75    Ben              64
4          95  Kevin              67
1         100   John              75

返回的样本是原始数据的 50%。

示例代码：`DataFrame.sample()` 对 DataFrame 进行过采样

如果 frac>1，那么参数 replace 应该是 True，以允许同一行可以被多次取样，否则，它将引发 ValueError。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=True)
print(dataframe1)

输出：

   Attendance   Name  Obtained Marks
3          75     Ben              64
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
1         100    John              75
2          80   Laura              82
0          60  Olivia              56
4          95   Kevin              67

如果 replace 被设置为 False，同时 frac 大于 1，则会产生 ValueError。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=False)
print(dataframe1)

输出：

Traceback (most recent call last):
  File "..\test.py", line 6, in <module>
    dataframe1 = dataframe.sample(frac=1.5, replace=False)
  File "..\lib\site-packages\pandas\core\generic.py", line 5044, in sample
    raise ValueError(
ValueError: Replace has to be set to `True` when upsampling the population `frac` > 1.

示例代码：`DataFrame.sample()` 和 `weights`

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=2, weights="Attendance")
print(dataframe1)

输出：

   Attendance   Name  Obtained Marks
1         100   John              75
4          95  Kevin              67

这里，在返回的样本中选择 Attendance 列中数值较大的行。

上一篇：Pandas DataFrame DataFrame.reindex() 函数

下一篇：Pandas DataFrame DataFrame.set_index() 函数

转载请发邮件至 1244347461@qq.com 进行申请，经作者同意之后，转载请以链接形式注明出处

本文地址：

Pandas DataFrame DataFrame.shift() 函数

发布时间：2024/04/24 浏览次数：133 分类：Python

DataFrame.shift() 函数是将 DataFrame 的索引按指定的周期数进行移位。

Pandas pandas.melt() 函数

发布时间：2024/04/24 浏览次数：101 分类：Python

pandas.melt()函数可以转换 DataFrame。

Python pandas.pivot_table() 函数

发布时间：2024/04/24 浏览次数：82 分类：Python

Python Pandas pivot_table()函数通过对数据进行汇总，避免了数据的重复。

Pandas read_csv()函数

发布时间：2024/04/24 浏览次数：254 分类：Python

Pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 DataFrame 中。

Pandas 追加数据到 CSV 中

发布时间：2024/04/24 浏览次数：352 分类：Python

本教程演示了如何在追加模式下使用 to_csv()向现有的 CSV 文件添加数据。

Pandas 多列合并

发布时间：2024/04/24 浏览次数：628 分类：Python

本教程介绍了如何在 Pandas 中使用 DataFrame.merge()方法合并两个 DataFrames。

用多个条件过滤 Pandas DataFrame

发布时间：2024/04/24 浏览次数：649 分类：Python

本教程解释了如何根据多个条件从 DataFrame 中过滤元素。

Pandas loc vs iloc

发布时间：2024/04/24 浏览次数：837 分类：Python

本教程介绍了如何使用 Python 中的 loc 和 iloc 从 Pandas DataFrame 中过滤数据。

在 Python 中将 Pandas 系列的日期时间转换为字符串

发布时间：2024/04/24 浏览次数：894 分类：Python

了解如何在 Python 中将 Pandas 系列日期时间转换为字符串

迹忆客专注技术分享