JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

Scatter Matrix in Pandas

Author:JIYIK Last Updated:2025/04/13 Views:

This tutorial explores using scatter matrices in Pandas to pair plots.


Scatter Matrix in Pandas

It is very important to check the correlation between the independent variables used for regression analysis during data preprocessing. Scatter plots can easily understand the correlation between the features.

Pandas provides the analyst with scatter_matrix()the function to implement these plots in a practical way. It is also used to determine whether the correlation is positive or negative.

Let us consider an nexample of variables; this function in Pandas will help us to have nrows and ncolumns which are n x nmatrices.

Given below are three simple steps to implement a scatter plot.

  1. Load the necessary libraries.
  2. Import the appropriate data.
  3. Use scatter_matrixthe method to draw the graphics.

grammar:

pandas.plotting.scatter_matrix(dataframe)

This tutorial will teach us how to use effectively scatter_matrix()as an analyst.


scatter_matrix()Using the method in Pandas

This example uses scatter_matrix()the method with no additional parameters.

Here, we use numpythe module to create dummy data. Three variables are created: x1, , x2and x3.

import numpy as np
import pandas as pd

np.random.seed(134)
N = 1000

x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)

Create a Pandas DataFrame using a dictionary:

df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
print(df.head())

Output:

         x1        x2         x3
0 -0.224315 -8.840152  10.145993
1  1.337257  2.383882  -1.854636
2  0.882366  3.544989  -1.117054
3  0.295153 -3.844863   3.634823
4  0.780587 -0.465342   2.121288

Finally, the data is ready for us to plot a chart.

import numpy as np
import pandas as pd
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt

np.random.seed(134)
N = 1000

x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)

df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
df.head()

# Creating the scatter matrix:
pd.plotting.scatter_matrix(df)
plt.show()

As we can see, we can generate these plots so easily. But what makes it so interesting?

  1. x1depicts the distribution of the variables , , x2and in our dummy data x3.
  2. Correlations between the variables can be observed.

Using the method with hist_kwdsthe parameter in Pandasscatter_matrix()

The next example uses hist_kwdsthe histogram parameter. We can use this parameter to provide input in the form of a Python dictionary, through which we can change the total number of bins of the histogram.

# Changing the number of bins of the scatter matrix in Python:
pd.plotting.scatter_matrix(df, hist_kwds={"bins": 30})

Output:


Using the method with diagonal = 'kde'the parameter in Pandasscatter_matrix()

In this final example, we will kdereplace the histogram with a distribution.

KDE stands for Kernel Density Estimation. It is a basic tool that can smooth data so that inferences can be made based on a limited sample of data.

Drawing a scatter plot using kdeis just as easy as making a histogram. To do this, we just need to hist_kwdsreplace with diagonal = 'kde'.

diagonalThe parameter cannot take two parameters into account: histand kde. It is very important to make sure that either one of them is used in your code.

The code to get kdeis changed as follows.

# Scatter matrix with Pandas and density plots:
pd.plotting.scatter_matrix(df, diagonal="kde")

Output:

We just need to read_csvimport the CSV file using the Python Pandas module through the method.

csv_file = "URL for the dataset"

# Reading the CSV file from the URL
df_s = pd.read_csv(csv_file, index_col=0)

# Checking the data quickly (first 5 rows):
df_s.head()

As with in Pandas scatter_matrix(), you can also use the method seabornavailable through the package pairplot.

Having a deep understanding of these modules helps in drawing these scatter plots; it also prevails in making them more user-friendly and creating more attractive visualizations.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Finding the installed version of Pandas

Publish Date:2025/04/12 Views:190 Category:Python

Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc

KeyError in Pandas

Publish Date:2025/04/12 Views:81 Category:Python

This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest

Grouping and Sorting in Pandas

Publish Date:2025/04/12 Views:90 Category:Python

This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies

Plotting Line Graph with Data Points in Pandas

Publish Date:2025/04/12 Views:65 Category:Python

Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize

Converting Timedelta to Int in Pandas

Publish Date:2025/04/12 Views:123 Category:Python

This tutorial will discuss converting a to a using dt the attribute in Pandas . timedelta int Use the Pandas dt attribute to timedelta convert int To timedelta convert to an integer value, we can use the property pandas of the library dt .

Pandas fill NaN values

Publish Date:2025/04/12 Views:93 Category:Python

This tutorial explains how we can use DataFrame.fillna() the method to fill NaN values ​​with specified values. We will use the following DataFrame in this article. import numpy as np import pandas as pd roll_no = [ 501 , 502 , 503 , 50

Pandas Convert String to Number

Publish Date:2025/04/12 Views:147 Category:Python

This tutorial explains how to pandas.to_numeric() convert string values ​​of a Pandas DataFrame into numeric type using the method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 , 103 , 343 , 565 ], "Name" :

How to Change the Data Type of a Column in Pandas

Publish Date:2025/04/12 Views:139 Category:Python

We will look at methods for changing the data type of columns in a Pandas Dataframe, as well as options like to_numaric , , as_type and infer_objects . We will also discuss how to to_numaric use downcasting the option in . to_numeric Method

Get the first row of Dataframe Pandas

Publish Date:2025/04/12 Views:78 Category:Python

This tutorial explains how to use the get_first_row pandas.DataFrame.iloc attribute and pandas.DataFrame.head() get_first_row method from a Pandas DataFrame. We will use the following DataFrame in the following example to explain how to get

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial