Running SQL Queries from Pandas
Execute arbitrary SELECT statements with pd.read_sql_query and parameterise queries safely to avoid SQL injection.
pd.read_sql: The Unified Interface
Pandas provides three SQL reading functions: pd.read_sql() (generic wrapper), pd.read_sql_table() (reads a full table by name), and pd.read_sql_query() (executes arbitrary SQL). For most analytical workflows, pd.read_sql_query() is the most powerful because it lets you write any SELECT statement with filtering, joining, and aggregating before data reaches Pandas. Using SQL for heavy lifting and Pandas for the final analysis is often more efficient than loading everything and filtering in Python.
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine('sqlite:///ecommerce.db')
# Three equivalent patterns
df1 = pd.read_sql('SELECT * FROM orders LIMIT 100', con=engine)
df2 = pd.read_sql_table('orders', con=engine) # full table
df3 = pd.read_sql_query('SELECT * FROM orders LIMIT 100', con=engine)
print(df3.head())
print(df3.columns.tolist())Filtering at the Database Level
Always filter data in SQL rather than loading everything and filtering in Pandas. A database with proper indexes can execute a WHERE clause on millions of rows and return only thousands in milliseconds, while Pandas would need to load gigabytes of data first. The golden rule: push predicates to the database. Use WHERE for row filters, SELECT col1, col2 for column selection, and LIMIT during development to preview results quickly.
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine('sqlite:///sales.db')
# Filter and project at SQL level — only fetch what you need
query = '''
SELECT order_id, customer_id, amount, status
FROM orders
WHERE status = 'completed'
AND order_date >= '2024-01-01'
AND amount > 50
LIMIT 1000
'''
df = pd.read_sql_query(query, con=engine)
print(f'Rows: {len(df)}, Columns: {list(df.columns)}')All lessons in this course
- Connecting to a Database with SQLAlchemy
- Running SQL Queries from Pandas
- Writing DataFrames to Database Tables
- Pandas vs. SQL: Choosing the Right Tool