Executing SQL Queries in Pandas
SQL stands for Structured Query Language; it is a well-known language for interacting with relational databases. There are many ways to run SQL queries in Python.
pandasql
Run SQL queries in Python using
This package has a method similar to R
that in . It provides a more familiar way to perform CRUD operations on a DataFrame.sqldf
sqldf
pandasql
Before we use pandasql
, we must first install it using the following command.
#Python 3.x
pip install -U pandasql
We will pandasql
import sqldf
the method from the module to run the query. We will then call the method with two parameters sqldf
.
The first argument is a SQL query in string format. The second argument is a set of session/environment variables ( globals()
or locals()
).
In the following code, first, we Student.csv
upload the data to the current working directory of the notebook. Then we read the student records Jupyter
using a traditional SQL query on the DataFrame ; it will display all the records in the DataFrame.Select
# Python 3.x
import pandas as pd
from pandasql import sqldf
def mysql(q):
return sqldf(q, globals())
df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df")
Output:
We have used the clause in the SQL query WHERE
to display only the selected records that meet the given conditions in the following code.
# Python 3.x
import pandas as pd
from pandasql import sqldf
def mysql(q):
return sqldf(q, globals())
df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df WHERE Department = 'SE'")
Output:
DuckDB
Run SQL queries in Python using
DuckDB
Is a Python API and a database management system that uses SQL queries to interact with the database.
To use DuckDB
, we should first install it using the following command.
#Python 3.x
pip install duckdb
In the following code, we import duckdb
the and Pandas packages, read the CSV file and run the query by using duckdb
the call query()
method. We pass the query (as a parameter) to query()
the method.
The code returns the result as a DataFrame. We can write any SQL query of our choice against the DataFrame.
# Python 3.x
import pandas as pd
import duckdb
df = pd.read_csv("Student.csv")
duckdb.query("SELECT * FROM df").df()
Output:
Fugue
Run SQL queries in Python using
Fugue
It is a unified interface for distributed computing that allows users to run Python, Pandas, and SQL code on Spark and Dask without rewriting it.
We must install it before we can use it using the following command fugue
.
#Python 3.x
pip install fugue[sql]
We have imported the Pandas and packages in the following code fugue
and loaded the DataFrame with the CSV file. We then passed our SQL query to fsql()
the method and called run()
the method with it.
# Python 3.x
import pandas as pd
from fugue_sql import fsql
df = pd.read_csv("Student.csv")
query = "SELECT * FROM df PRINT"
fsql(query).run()
Output:
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Finding the installed version of Pandas
Publish Date:2025/04/12 Views:190 Category:Python
-
Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc
KeyError in Pandas
Publish Date:2025/04/12 Views:81 Category:Python
-
This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest
Grouping and Sorting in Pandas
Publish Date:2025/04/12 Views:90 Category:Python
-
This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies
Plotting Line Graph with Data Points in Pandas
Publish Date:2025/04/12 Views:65 Category:Python
-
Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize
Converting Timedelta to Int in Pandas
Publish Date:2025/04/12 Views:123 Category:Python
-
This tutorial will discuss converting a to a using dt the attribute in Pandas . timedelta int Use the Pandas dt attribute to timedelta convert int To timedelta convert to an integer value, we can use the property pandas of the library dt .
Pandas fill NaN values
Publish Date:2025/04/12 Views:93 Category:Python
-
This tutorial explains how we can use DataFrame.fillna() the method to fill NaN values with specified values. We will use the following DataFrame in this article. import numpy as np import pandas as pd roll_no = [ 501 , 502 , 503 , 50
Pandas Convert String to Number
Publish Date:2025/04/12 Views:147 Category:Python
-
This tutorial explains how to pandas.to_numeric() convert string values of a Pandas DataFrame into numeric type using the method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 , 103 , 343 , 565 ], "Name" :
How to Change the Data Type of a Column in Pandas
Publish Date:2025/04/12 Views:139 Category:Python
-
We will look at methods for changing the data type of columns in a Pandas Dataframe, as well as options like to_numaric , , as_type and infer_objects . We will also discuss how to to_numaric use downcasting the option in . to_numeric Method
Get the first row of Dataframe Pandas
Publish Date:2025/04/12 Views:78 Category:Python
-
This tutorial explains how to use the get_first_row pandas.DataFrame.iloc attribute and pandas.DataFrame.head() get_first_row method from a Pandas DataFrame. We will use the following DataFrame in the following example to explain how to get