Excel is perfect for quick analysis: filters, pivots, charts—right there in front of you. But once your data gets larger or more important, you start to feel the pain: slow files, version chaos, copy–paste formulas, and endless tabs called Final_v7_REAL.xlsx.
In this tutorial, we’ll take a realistic scenario—a simple Excel sales log—and turn it into a small analytics engine using Python + SQLite. By the end you’ll have:
- a clean SQLite database file
- a repeatable Python script you can run every month
- automatic summary tables (top customers, monthly revenue, top products)
This quick start assumes:
- You’re comfortable working with Excel.
- You have Python 3 installed.
- You can run a Python script from the command line.
No prior SQL knowledge is required—the script handles the SQL for you, and we’ll walk through each query.
1. Why bother with a database at all?
Excel stores data in a visual grid and lets you manipulate values directly. A database stores data in structured tables and lets you ask powerful, repeatable questions using queries.
| Question | Excel experience | Python + SQLite experience |
|---|---|---|
| How many rows can I store? | Slows down after ~100k rows | Handles millions of rows comfortably |
| Can I repeat the same analysis every month? | Copy sheets, update formulas, hope nothing breaks | Run the same script; results are reproducible |
| Can multiple tools use the same data? | Hard; usually copied to many files | Easy; SQLite file can be read by Python, BI tools, apps |
| Can I enforce consistent structure? | Not really; any cell can change | Yes; schemas and constraints keep data clean |
| Can I automate? | Limited or fragile | First-class; Python, cron, CI, pipelines |
The key idea of this tutorial: keep Excel as the input surface, but let Python + SQLite do the heavy lifting.
2. Our example: a simple sales log
Imagine an Excel file sales_log.xlsx with a single sheet containing these columns:
date– date of the salecustomer– customer nameregion– e.g. AU, US, UKproduct– product name or SKUquantity– units soldunit_price– price per unit
You might currently answer questions like:
- “Who are my top 10 customers this quarter?”
- “What is revenue by month and by region?”
- “Which products drive most of our revenue?”
In Excel, you can do this with pivot tables—but it’s manual every time. We’re going to:
- Load this Excel file with Python.
- Store it in a SQLite database called
sales_analytics.db. - Run SQL queries to answer those questions.
- Export the answers as clean CSV summaries.
2.1 Don’t have a dataset? Use the demo CSV
If you don’t already have a sales spreadsheet, this tutorial comes with a small demo dataset you can start with. Download it as a CSV file, open it in Excel, then save it as an Excel workbook.
Save it somewhere convenient (for example in a data/ folder), open it in Excel, and use
“Save As…” to create sales_log.xlsx next to the script you’ll run.
In the examples below, we’ll assume the demo file lives at data/demo_sales_log.csv and that
you’ve saved a copy as sales_log.xlsx alongside the script.
3. Install Python packages
Install the only extra package we need, pandas (for reading Excel and writing CSV):
pip install pandas openpyxl
4. Download the “Excel → Analytics Engine” script
This tutorial comes with a ready-to-run Python script, excel_sales_analytics.py. It:
- reads your
.xlsxfile into a DataFrame - cleans column names for database use
- creates (or replaces) a
salestable in a SQLite database - runs several analytics queries (top customers, monthly revenue, top products)
- exports each result as a CSV in a
summaries/folder
Save excel_sales_analytics.py into a working folder together with
sales_log.xlsx before running the commands below.
5. How to run it
- Make sure
excel_sales_analytics.pyandsales_log.xlsxare in the same folder. - Open a terminal or command prompt in that folder.
- Run:
python excel_sales_analytics.py sales_log.xlsx \
--db sales_analytics.db \
--out summaries
After it runs, you will have:
sales_analytics.db– your new SQLite databasesummaries/row_count.csvsummaries/date_range.csvsummaries/revenue_by_customer.csvsummaries/monthly_revenue.csvsummaries/top_products.csv
Each of those CSVs can be:
- opened back in Excel for presentation
- pulled into Power BI / Tableau
- used as the basis for dashboards or reporting emails
If you’re using the demo dataset, download demo_sales_log.csv, open it in Excel, save it as
sales_log.xlsx, and then run the command above.
6. Walkthrough: what the script actually does
6.1 Load and clean the Excel data
First we load the Excel sheet into a DataFrame and normalise column names to
snake_case so they work cleanly in SQL.
df = pd.read_excel(excel_path)
df.columns = (
df.columns
.str.strip()
.str.lower()
.str.replace(" ", "_")
.str.replace(r"[^0-9a-zA-Z_]", "", regex=True)
)
Then we enforce that the sheet actually has the columns we expect (date, customer, product, etc.). Instead of silently failing, the script will tell you what’s missing.
required_cols = {"date", "customer", "region", "product", "quantity", "unit_price"}
missing = required_cols - set(df.columns)
if missing:
raise ValueError(f"Missing expected columns in Excel file: {missing}")
6.2 Store data in SQLite
Instead of leaving the data trapped in Excel, we store it in a structured table called
sales inside a SQLite database file.
conn = sqlite3.connect(db_path)
df.to_sql("sales", conn, if_exists="replace", index=False)
conn.commit()
The power move here: once the data is in SQLite, anything that can speak SQL can use it —Python scripts, BI tools, even other languages.
6.3 Ask powerful questions with SQL
Now we let SQLite do the heavy lifting. Here are some of the queries embedded in the script:
| Question | SQL sketch |
|---|---|
| How many rows are in my dataset? | SELECT COUNT(*) FROM sales; |
| What date range does this file cover? | SELECT MIN(date), MAX(date) FROM sales; |
| Who are my top 10 customers by revenue? |
SELECT customer, SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY customer
ORDER BY revenue DESC
LIMIT 10;
|
| What is my revenue per month? |
SELECT strftime('%Y-%m', date) AS year_month,
SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY year_month;
|
| Which products drive the most revenue? |
SELECT product,
SUM(quantity) AS units_sold,
SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY product
ORDER BY revenue DESC
LIMIT 10;
|
In Excel, each of these might be a separate pivot table, chart, or formula. In Python + SQLite, they’re just queries you can run over and over, on each new file.
6.4 Export clean summary CSVs
The script doesn’t just print the results; it writes them out into tidy CSV files that you can feed into other systems:
for name, df_result in results.items():
out_path = out_dir / f"{name}.csv"
df_result.to_csv(out_path, index=False)
print(f"[saved] {out_path}")
This is where the “engine” part really shows: monthly reports, dashboards, or emails can all be powered by these outputs, with Excel acting purely as the data entry layer.
7. Full script (for reference)
#!/usr/bin/env python3
"""
excel_sales_analytics.py
Turn a simple Excel sales log into a reusable analytics engine using SQLite.
Usage:
python excel_sales_analytics.py sales_log.xlsx --db sales_analytics.db --out summaries
"""
import argparse
from pathlib import Path
import sqlite3
import pandas as pd
REQUIRED_COLUMNS = {"date", "customer", "region", "product", "quantity", "unit_price"}
def load_and_clean_excel(excel_path: Path) -> pd.DataFrame:
if not excel_path.exists():
raise FileNotFoundError(f"Excel file not found: {excel_path}")
print(f"[info] Loading Excel file: {excel_path}")
df = pd.read_excel(excel_path)
# Normalise column names to snake_case
df.columns = (
df.columns
.str.strip()
.str.lower()
.str.replace(" ", "_")
.str.replace(r"[^0-9a-zA-Z_]", "", regex=True)
)
missing = REQUIRED_COLUMNS - set(df.columns)
if missing:
raise ValueError(
f"Missing expected columns in Excel file: {missing}\n"
f"Columns present: {list(df.columns)}"
)
print(f"[info] Loaded {len(df)} rows with columns: {list(df.columns)}")
return df
def write_to_sqlite(df: pd.DataFrame, db_path: Path, table_name: str = "sales") -> sqlite3.Connection:
if db_path.parent and not db_path.parent.exists():
db_path.parent.mkdir(parents=True, exist_ok=True)
print(f"[info] Writing data to SQLite database: {db_path} (table: {table_name})")
conn = sqlite3.connect(db_path)
df.to_sql(table_name, conn, if_exists="replace", index=False)
conn.commit()
return conn
def run_analytics_queries(conn: sqlite3.Connection) -> dict:
queries = {
"row_count": """
SELECT COUNT(*) AS rows
FROM sales;
""",
"date_range": """
SELECT MIN(date) AS start_date, MAX(date) AS end_date
FROM sales;
""",
"revenue_by_customer": """
SELECT
customer,
ROUND(SUM(quantity * unit_price), 2) AS revenue
FROM sales
GROUP BY customer
ORDER BY revenue DESC
LIMIT 10;
""",
"monthly_revenue": """
SELECT
strftime('%Y-%m', date) AS year_month,
ROUND(SUM(quantity * unit_price), 2) AS revenue
FROM sales
GROUP BY year_month
ORDER BY year_month;
""",
"top_products": """
SELECT
product,
SUM(quantity) AS units_sold,
ROUND(SUM(quantity * unit_price), 2) AS revenue
FROM sales
GROUP BY product
ORDER BY revenue DESC
LIMIT 10;
""",
}
results = {}
for name, sql in queries.items():
print(f"[info] Running query: {name}")
results[name] = pd.read_sql_query(sql, conn)
return results
def save_summaries(results: dict, out_dir: Path) -> None:
out_dir.mkdir(parents=True, exist_ok=True)
for name, df_result in results.items():
out_path = out_dir / f"{name}.csv"
df_result.to_csv(out_path, index=False)
print(f"[saved] {out_path}")
def main() -> None:
parser = argparse.ArgumentParser(
description="Convert an Excel sales log into a SQLite database and summary analytics CSVs."
)
parser.add_argument("excel_file", help="Path to the Excel sales log (.xlsx)")
parser.add_argument("--db", default="sales_analytics.db", help="Path to the SQLite DB file to create")
parser.add_argument("--out", default="summaries", help="Directory to write summary CSV files")
args = parser.parse_args()
excel_path = Path(args.excel_file)
db_path = Path(args.db)
out_dir = Path(args.out)
df = load_and_clean_excel(excel_path)
conn = write_to_sqlite(df, db_path)
try:
results = run_analytics_queries(conn)
finally:
conn.close()
save_summaries(results, out_dir)
print("[done] Analytics complete.")
if __name__ == "__main__":
main()
8. Where to go next
- Add more queries: revenue by region, margin analysis, repeat customers.
- Schedule the script (cron, Task Scheduler) to run automatically when a new Excel file arrives.
-
Point a BI tool (e.g. Power BI) at
sales_analytics.dbfor live dashboards. - Split additional Excel sheets into multiple tables and join them with SQL.
The big shift is mindset: Excel becomes your input UI, while Python + SQLite become your analytics engine—automated, repeatable, and ready to scale.