Pandas Applying Transformations with Groupby

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas Applying Transformations with Groupby

Author：JIYIK Last Updated：2025/04/13 Views：

groupby()is a powerful method in Python that allows us to separate data into different groups based on certain criteria. The purpose is to run calculations and perform better analysis.

Difference between `apply()`and in Python`transform()`

apply()and transform()are groupby()two methods used in conjunction with the method call. The difference between these two methods lies in the parameters passed and the values returned.

apply()The method accepts arguments as DataFrameand returns a DataFrame 标量or 序列. Thus, it allows us to operate on columns, rows, and complete DataFrames for each group.

transform()The method only accepts as arguments a Series representing the columns in each group and returns a Series of the same length as the input Series. Therefore, we can only operate on specific columns within each group at a time.

`apply()`Using Methods in Python Pandas

In the following code, we load a CSV file containing student records. We use the apply function to display the highest marks in each department.

First, we have to groupby()group each department using the method. Then max()we find the highest score for each department using the function.

The output is returned in the form of a Series. We can also perform operations on multiple columns or on the entire DataFrame.

# Python 3.x
import pandas as pd

df = pd.read_csv("Student.csv")
display(df)


def f(my_df):
    return my_df.Marks.max()


df.groupby("Department").apply(f)

Output:

`transform()`Using Methods in Python Pandas

In the next example, we groupby()group each department by using the method, Mean_Marksmerge another column into the DataFrame, and then meancalculate the average of the two departments using the keyword.

The output shows the average scores of the two departments.

Here, transform()the method operates on a single column, in our case Marks.

# Python 3.x
import pandas as pd

df = pd.read_csv("Student.csv")
display(df)
df["Mean_Marks"] = df.groupby("Department")["Marks"].transform("mean")
display(df)

Output:

Previous：Plotting a Pandas Series

Next：Groupby Index Column in Pandas

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >