Scatter Matrix in Pandas

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Scatter Matrix in Pandas

Author：JIYIK Last Updated：2025/04/13 Views：

This tutorial explores using scatter matrices in Pandas to pair plots.

Scatter Matrix in Pandas

It is very important to check the correlation between the independent variables used for regression analysis during data preprocessing. Scatter plots can easily understand the correlation between the features.

Pandas provides the analyst with scatter_matrix()the function to implement these plots in a practical way. It is also used to determine whether the correlation is positive or negative.

Let us consider an nexample of variables; this function in Pandas will help us to have nrows and ncolumns which are n x nmatrices.

Given below are three simple steps to implement a scatter plot.

Load the necessary libraries.
Import the appropriate data.
Use scatter_matrixthe method to draw the graphics.

grammar:

pandas.plotting.scatter_matrix(dataframe)

This tutorial will teach us how to use effectively scatter_matrix()as an analyst.

`scatter_matrix()`Using the method in Pandas

This example uses scatter_matrix()the method with no additional parameters.

Here, we use numpythe module to create dummy data. Three variables are created: x1, , x2and x3.

import numpy as np
import pandas as pd

np.random.seed(134)
N = 1000

x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)

Create a Pandas DataFrame using a dictionary:

df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
print(df.head())

Output:

         x1        x2         x3
0 -0.224315 -8.840152  10.145993
1  1.337257  2.383882  -1.854636
2  0.882366  3.544989  -1.117054
3  0.295153 -3.844863   3.634823
4  0.780587 -0.465342   2.121288

Finally, the data is ready for us to plot a chart.

import numpy as np
import pandas as pd
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt

np.random.seed(134)
N = 1000

x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)

df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
df.head()

# Creating the scatter matrix:
pd.plotting.scatter_matrix(df)
plt.show()

As we can see, we can generate these plots so easily. But what makes it so interesting?

x1depicts the distribution of the variables , , x2and in our dummy data x3.
Correlations between the variables can be observed.

Using the method with `hist_kwds`the parameter in Pandas`scatter_matrix()`

The next example uses hist_kwdsthe histogram parameter. We can use this parameter to provide input in the form of a Python dictionary, through which we can change the total number of bins of the histogram.

# Changing the number of bins of the scatter matrix in Python:
pd.plotting.scatter_matrix(df, hist_kwds={"bins": 30})

Output:

Using the method with `diagonal = 'kde'`the parameter in Pandas`scatter_matrix()`

In this final example, we will kdereplace the histogram with a distribution.

KDE stands for Kernel Density Estimation. It is a basic tool that can smooth data so that inferences can be made based on a limited sample of data.

Drawing a scatter plot using kdeis just as easy as making a histogram. To do this, we just need to hist_kwdsreplace with diagonal = 'kde'.

diagonalThe parameter cannot take two parameters into account: histand kde. It is very important to make sure that either one of them is used in your code.

The code to get kdeis changed as follows.

# Scatter matrix with Pandas and density plots:
pd.plotting.scatter_matrix(df, diagonal="kde")

Output:

We just need to read_csvimport the CSV file using the Python Pandas module through the method.

csv_file = "URL for the dataset"

# Reading the CSV file from the URL
df_s = pd.read_csv(csv_file, index_col=0)

# Checking the data quickly (first 5 rows):
df_s.head()

As with in Pandas scatter_matrix(), you can also use the method seabornavailable through the package pairplot.

Having a deep understanding of these modules helps in drawing these scatter plots; it also prevails in making them more user-friendly and creating more attractive visualizations.

Previous：GroupBy Application in Pandas

Next：Plotting a Pandas Series

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >

Scatter Matrix in Pandas

Scatter Matrix in Pandas

`scatter_matrix()`Using the method in Pandas

Using the method with `hist_kwds`the parameter in Pandas`scatter_matrix()`

Using the method with `diagonal = 'kde'`the parameter in Pandas`scatter_matrix()`

Related Articles

Finding the installed version of Pandas

KeyError in Pandas

Grouping and Sorting in Pandas

Plotting Line Graph with Data Points in Pandas

Converting Timedelta to Int in Pandas

Pandas fill NaN values

Pandas Convert String to Number

How to Change the Data Type of a Column in Pandas

Get the first row of Dataframe Pandas

Scan to Read All Tech Tutorials

Social Media

 https://www.github.com/onmpw

 qq:1244347461



Recommended

Tags

Scatter Matrix in Pandas

Scatter Matrix in Pandas

scatter_matrix()Using the method in Pandas

Using the method with hist_kwdsthe parameter in Pandasscatter_matrix()

Using the method with diagonal = 'kde'the parameter in Pandasscatter_matrix()

Related Articles

Scan to Read All Tech Tutorials

Social Media  https://www.github.com/onmpw  qq:1244347461 

Recommended

Tags

`scatter_matrix()`Using the method in Pandas

Using the method with `hist_kwds`the parameter in Pandas`scatter_matrix()`

Using the method with `diagonal = 'kde'`the parameter in Pandas`scatter_matrix()`

Social Media

 https://www.github.com/onmpw

 qq:1244347461

