Scatter Matrix in Pandas
This tutorial explores using scatter matrices in Pandas to pair plots.
Scatter Matrix in Pandas
It is very important to check the correlation between the independent variables used for regression analysis during data preprocessing. Scatter plots can easily understand the correlation between the features.
Pandas provides the analyst with scatter_matrix()
the function to implement these plots in a practical way. It is also used to determine whether the correlation is positive or negative.
Let us consider an n
example of variables; this function in Pandas will help us to have n
rows and n
columns which are n x n
matrices.
Given below are three simple steps to implement a scatter plot.
- Load the necessary libraries.
- Import the appropriate data.
- Use
scatter_matrix
the method to draw the graphics.
grammar:
pandas.plotting.scatter_matrix(dataframe)
This tutorial will teach us how to use effectively scatter_matrix()
as an analyst.
scatter_matrix()
Using the method in Pandas
This example uses scatter_matrix()
the method with no additional parameters.
Here, we use numpy
the module to create dummy data. Three variables are created: x1
, , x2
and x3
.
import numpy as np
import pandas as pd
np.random.seed(134)
N = 1000
x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)
Create a Pandas DataFrame using a dictionary:
df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
print(df.head())
Output:
x1 x2 x3
0 -0.224315 -8.840152 10.145993
1 1.337257 2.383882 -1.854636
2 0.882366 3.544989 -1.117054
3 0.295153 -3.844863 3.634823
4 0.780587 -0.465342 2.121288
Finally, the data is ready for us to plot a chart.
import numpy as np
import pandas as pd
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
np.random.seed(134)
N = 1000
x1 = np.random.normal(0, 1, N)
x2 = x1 + np.random.normal(0, 3, N)
x3 = 2 * x1 - x2 + np.random.normal(0, 2, N)
df = pd.DataFrame({"x1": x1, "x2": x2, "x3": x3})
df.head()
# Creating the scatter matrix:
pd.plotting.scatter_matrix(df)
plt.show()
As we can see, we can generate these plots so easily. But what makes it so interesting?
x1
depicts the distribution of the variables , ,x2
and in our dummy datax3
.- Correlations between the variables can be observed.
Using the method with hist_kwds
the parameter in Pandasscatter_matrix()
The next example uses hist_kwds
the histogram parameter. We can use this parameter to provide input in the form of a Python dictionary, through which we can change the total number of bins of the histogram.
# Changing the number of bins of the scatter matrix in Python:
pd.plotting.scatter_matrix(df, hist_kwds={"bins": 30})
Output:
Using the method with diagonal = 'kde'
the parameter in Pandasscatter_matrix()
In this final example, we will kde
replace the histogram with a distribution.
KDE stands for Kernel Density Estimation. It is a basic tool that can smooth data so that inferences can be made based on a limited sample of data.
Drawing a scatter plot using kde
is just as easy as making a histogram. To do this, we just need to hist_kwds
replace with diagonal = 'kde'
.
diagonal
The parameter cannot take two parameters into account: hist
and kde
. It is very important to make sure that either one of them is used in your code.
The code to get kde
is changed as follows.
# Scatter matrix with Pandas and density plots:
pd.plotting.scatter_matrix(df, diagonal="kde")
Output:
We just need to read_csv
import the CSV file using the Python Pandas module through the method.
csv_file = "URL for the dataset"
# Reading the CSV file from the URL
df_s = pd.read_csv(csv_file, index_col=0)
# Checking the data quickly (first 5 rows):
df_s.head()
As with in Pandas scatter_matrix()
, you can also use the method seaborn
available through the package pairplot
.
Having a deep understanding of these modules helps in drawing these scatter plots; it also prevails in making them more user-friendly and creating more attractive visualizations.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Finding the installed version of Pandas
Publish Date:2025/04/12 Views:190 Category:Python
-
Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc
KeyError in Pandas
Publish Date:2025/04/12 Views:81 Category:Python
-
This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest
Grouping and Sorting in Pandas
Publish Date:2025/04/12 Views:90 Category:Python
-
This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies
Plotting Line Graph with Data Points in Pandas
Publish Date:2025/04/12 Views:65 Category:Python
-
Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize
Converting Timedelta to Int in Pandas
Publish Date:2025/04/12 Views:123 Category:Python
-
This tutorial will discuss converting a to a using dt the attribute in Pandas . timedelta int Use the Pandas dt attribute to timedelta convert int To timedelta convert to an integer value, we can use the property pandas of the library dt .
Pandas fill NaN values
Publish Date:2025/04/12 Views:93 Category:Python
-
This tutorial explains how we can use DataFrame.fillna() the method to fill NaN values with specified values. We will use the following DataFrame in this article. import numpy as np import pandas as pd roll_no = [ 501 , 502 , 503 , 50
Pandas Convert String to Number
Publish Date:2025/04/12 Views:147 Category:Python
-
This tutorial explains how to pandas.to_numeric() convert string values of a Pandas DataFrame into numeric type using the method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 , 103 , 343 , 565 ], "Name" :
How to Change the Data Type of a Column in Pandas
Publish Date:2025/04/12 Views:139 Category:Python
-
We will look at methods for changing the data type of columns in a Pandas Dataframe, as well as options like to_numaric , , as_type and infer_objects . We will also discuss how to to_numaric use downcasting the option in . to_numeric Method
Get the first row of Dataframe Pandas
Publish Date:2025/04/12 Views:78 Category:Python
-
This tutorial explains how to use the get_first_row pandas.DataFrame.iloc attribute and pandas.DataFrame.head() get_first_row method from a Pandas DataFrame. We will use the following DataFrame in the following example to explain how to get