Groupby Index Column in Pandas

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Groupby Index Column in Pandas

Author：JIYIK Last Updated：2025/04/13 Views：

groupbyThis tutorial shows how to categorize data and apply functions to the categories in Python Pandas . Use groupby()the function to group multiple index columns in Pandas with examples.

`groupby()`Group by index column using function in Python Pandas

In this post, Pandas DataFrame data.groupby()functions group data based on specific criteria. Pandas objects can be divided into any number of groups along any axis.

The mapping of labels to group names is an abstract definition of a grouping. groupbyOperations split objects, apply functions, and combine the results.

This is useful for grouping and performing operations on large amounts of data. Pandas groupbydefault behavior groupbyconverts the columns to index and removes them from the DataFrame's column list.

grammar:

DataFrame.groupby(
    by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True
)

parameter:


`by`	mapping, function, string or iterable
`axis`	Integer, default value 0
`level`	For a multi-index axis, group by a specific level or levels (hierarchical).
`as_index`	Return an object with group labels as index into the aggregate output. This only works for DataFrame input. When `index=False`, the output is `SQL 样式`grouped.
`sort`	The keys within a group should be sorted. Turn this off to improve performance. It should be noted that the order of observations within each group is not affected by this. `Groupby`The order of rows within each group is maintained.
`group_keys`	Add group keys to the index when calling apply to identify fragments to squeeze: if possible, reduce the dimensionality of the return type; otherwise, return a consistent type.

Take a DataFrame with two columns: dateSum item sell. GroupbyDate and ItemSold and get the item-by-item count for the user.

First, we need to import the necessary libraries pandasand numpy, create three columns ct, dateand item_selland pass a set of values to these columns.

import pandas as pd
import numpy as np

data = pd.DataFrame()
data["date"] = ["a", "a", "a", "b"]
data["item_sell"] = ["z", "z", "a", "a"]
data["ct"] = 1
print(data)

Output:

  date item_sell  ct
0    a         z   1
1    a         z   1
2    a         a   1
3    b         a   1

Use the dateand item_sellcolumns for grouping.

import pandas as pd
import numpy as np

data = pd.DataFrame()
data["date"] = ["a", "a", "a", "b"]
data["item_sell"] = ["z", "z", "a", "a"]
data["ct"] = 1
output = pd.pivot_table(data, values="ct", index=["date", "item_sell"], aggfunc=np.sum)
print(output)

Output:

 				ct
date item_sell
a    a           1
     z           2
b    a           1

groupby()The by argument can now refer to column names or index level names.

import pandas as pd
import numpy as np

arrays = [
    ["rar", "raz", "bal", "bac", "foa", "foa", "qus", "qus"],
    ["six", "seven", "six", "seven", "six", "seven", "six", "seven"],
]
index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
data = pd.DataFrame({"C": [1, 1, 1, 1, 2, 2, 3, 3], "D": np.arange(8)}, index=index)
print(data)

Output:

              C  D
first second
rar   six     1  0
raz   seven   1  1
bal   six     1  2
bac   seven   1  3
foa   six     2  4
      seven   2  5
qus   six     3  6
      seven   3  7

Group by secondand C, and then use sumthe function to calculate the sum.

import pandas as pd
import numpy as np

arrays = [
    ["rar", "raz", "bal", "bac", "foa", "foa", "qus", "qus"],
    ["six", "seven", "six", "seven", "six", "seven", "six", "seven"],
]
index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
data = pd.DataFrame({"C": [1, 1, 1, 1, 2, 2, 3, 3], "D": np.arange(8)}, index=index)
output = data.groupby(["second", "C"]).sum()
print(output)

Output:

`groupby()`Using functions on CSV file data in Python Pandas

Now use groupby()the function on the CSV file. To download the CSV file used in the code, click here ([Students’ Performance on the Exam | Kaggle]).

The CSV file used is of student performance. To gendergroup the data by , use groupby()the function.

The function of Python Pandas library read_csvis used to read the file from the drive. Store the file in the data variable.

import pandas as pd

data = pd.read_csv("/content/drive/MyDrive/CSV/StudentsPerformance.csv")
print(data)

Output:

Apply groupby()function.

import pandas as pd

data = pd.read_csv("StudentsPerformance.csv")
std = data.groupby("gender")
print(std.first())

Let's print the values in any group. For this, use the name of the team.

get_groupFunction is used to find entries in any group. Find out femalethe values contained in the group.

import pandas as pd

data = pd.read_csv("StudentsPerformance.csv")
std = data.groupby("gender")
print(std.get_group("female"))

Output:

Use groupby()the function to create multiple category groups. To split, use multiple columns.

import pandas as pd

data = pd.read_csv("StudentsPerformance.csv")
std_per = data.groupby(["gender", "lunch"])
print(std_per.first())

Output:

Groupby()is a general purpose function with many variations. It makes it very simple and efficient to split a DataFrame based on some criteria.

Previous：Pandas Applying Transformations with Groupby

Next：Pandas DataFrame Create New Column Based on Other Columns

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >