JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

Converting Categorical Variables to Numerical in Pandas

Author:JIYIK Last Updated:2025/04/12 Views:

This tutorial explored the concept of converting categorical variables to numerical variables in Pandas.


Converting Categorical Variables to Numerical Variables in Pandas

This tutorial lets us understand how and why to convert a certain variable from one variable to another, especially how to convert a categorical data type variable to a numerical variable.

One may need to perform such operations because a certain data type may not be feasible for the analyst’s analytical or interpretation task. In such cases, Pandas helps in converting a variable of a certain type into another variable.

Let us understand how to perform such a complex operation.

However, before we start, we will create a dummy DataFrame to work with. Here we create a DataFrame, viz df.

dfWe add some columns and some data to this DataFrame. We can do this using the following code.

import pandas as pd

df = pd.DataFrame(
    {"col1": [1, 2, 3, 4, 5], "col2": list("abcab"), "col3": list("ababb")}
)

The above code creates a DataFrame with some entries. To see the entries in the data, we use the following code.

print(df)

The above code gives the following output.

   col1 col2 col3
0     1    a    a
1     2    b    b
2     3    c    a
3     4    a    b
4     5    b    b

As we can see, we have 4 columns and 5 rows indexed from value 0 to value 4. Looking at our DataFrame, we can see that there are certain numerical values ​​in our data and other letters.

Our job now is to convert these letter values ​​into numerical values.


Use the function in Pandas applyto convert categorical variables into numerical variables

After setting up the data, let's jump right into our task. The first step is to visualize the categories of each column.

This category in other programming languages ​​is also called data type. Let's use the following code to see the data type associated with each column.

df["col2"] = df["col2"].astype("category")
df["col3"] = df["col3"].astype("category")
print(df.dtypes)

The output of the code is shown in the following figure.

col1       	int64
col2    	category
col3    	category
dtype: object

As we can see, we have the data types for each column listed in the table above. We have and with data int64types col1. categoryAlso col2similar col3to col2.

Now that we know the data type of each column, we can move on to the next step.

The next step is to find the categorical columns and list them together. This is not difficult in our case, but it is an extremely important step because it helps us understand which columns will be converted to numerical variables.

cat_columns = df.select_dtypes(["category"]).columns

As shown in the code, we get all the columns dtypeswhere is equal to category. Similarly, we can get any according to our requirement dtype.

Now that we have found all the categorical columns, let’s visualize them. We can do this using the following code.

print(cat_columns)

The code obtains the following output.

Index(['col2', 'col3'], dtype='object')

This will indicate the associated categorical column dtype.

The final step is to convert these categorical variables into numerical variables. We can do this using the following code.

df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)

The code obtains the following output.

   col1  col2  col3
0     1     0     0
1     2     1     1
2     3     2     0
3     4     0     1
4     5     1     1

We can print(df)get the output using the code.

As shown in the above output, we have successfully converted the letters into numerical values, thereby helping us to convert the categorical variable into a numerical variable.

Thus, using applythe function and obtaining the categorical columns, we have converted the variables in the DataFrame from categorical to numerical.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Finding the installed version of Pandas

Publish Date:2025/04/12 Views:190 Category:Python

Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc

KeyError in Pandas

Publish Date:2025/04/12 Views:81 Category:Python

This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest

Grouping and Sorting in Pandas

Publish Date:2025/04/12 Views:90 Category:Python

This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies

Plotting Line Graph with Data Points in Pandas

Publish Date:2025/04/12 Views:65 Category:Python

Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize

Converting Timedelta to Int in Pandas

Publish Date:2025/04/12 Views:123 Category:Python

This tutorial will discuss converting a to a using dt the attribute in Pandas . timedelta int Use the Pandas dt attribute to timedelta convert int To timedelta convert to an integer value, we can use the property pandas of the library dt .

Pandas fill NaN values

Publish Date:2025/04/12 Views:93 Category:Python

This tutorial explains how we can use DataFrame.fillna() the method to fill NaN values ​​with specified values. We will use the following DataFrame in this article. import numpy as np import pandas as pd roll_no = [ 501 , 502 , 503 , 50

Pandas Convert String to Number

Publish Date:2025/04/12 Views:147 Category:Python

This tutorial explains how to pandas.to_numeric() convert string values ​​of a Pandas DataFrame into numeric type using the method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 , 103 , 343 , 565 ], "Name" :

How to Change the Data Type of a Column in Pandas

Publish Date:2025/04/12 Views:139 Category:Python

We will look at methods for changing the data type of columns in a Pandas Dataframe, as well as options like to_numaric , , as_type and infer_objects . We will also discuss how to to_numaric use downcasting the option in . to_numeric Method

Get the first row of Dataframe Pandas

Publish Date:2025/04/12 Views:78 Category:Python

This tutorial explains how to use the get_first_row pandas.DataFrame.iloc attribute and pandas.DataFrame.head() get_first_row method from a Pandas DataFrame. We will use the following DataFrame in the following example to explain how to get

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial