2025-11-28

Excel vs Database: Turn a Spreadsheet into an Analytics Engine with Python + SQLite

Data, Python, Systems · PLEX Lab

Excel is perfect for quick analysis: filters, pivots, charts—right there in front of you. But once your data gets larger or more important, you start to feel the pain: slow files, version chaos, copy–paste formulas, and endless tabs called Final_v7_REAL.xlsx.

In this tutorial, we’ll take a realistic scenario—a simple Excel sales log—and turn it into a small analytics engine using Python + SQLite. By the end you’ll have:

  • a clean SQLite database file
  • a repeatable Python script you can run every month
  • automatic summary tables (top customers, monthly revenue, top products)
What you’ll need

This quick start assumes:

  • You’re comfortable working with Excel.
  • You have Python 3 installed.
  • You can run a Python script from the command line.

No prior SQL knowledge is required—the script handles the SQL for you, and we’ll walk through each query.

1. Why bother with a database at all?

Excel stores data in a visual grid and lets you manipulate values directly. A database stores data in structured tables and lets you ask powerful, repeatable questions using queries.

Excel vs SQLite for growing data
Question Excel experience Python + SQLite experience
How many rows can I store? Slows down after ~100k rows Handles millions of rows comfortably
Can I repeat the same analysis every month? Copy sheets, update formulas, hope nothing breaks Run the same script; results are reproducible
Can multiple tools use the same data? Hard; usually copied to many files Easy; SQLite file can be read by Python, BI tools, apps
Can I enforce consistent structure? Not really; any cell can change Yes; schemas and constraints keep data clean
Can I automate? Limited or fragile First-class; Python, cron, CI, pipelines

The key idea of this tutorial: keep Excel as the input surface, but let Python + SQLite do the heavy lifting.

2. Our example: a simple sales log

Imagine an Excel file sales_log.xlsx with a single sheet containing these columns:

  • date – date of the sale
  • customer – customer name
  • region – e.g. AU, US, UK
  • product – product name or SKU
  • quantity – units sold
  • unit_price – price per unit

You might currently answer questions like:

  • “Who are my top 10 customers this quarter?”
  • “What is revenue by month and by region?”
  • “Which products drive most of our revenue?”

In Excel, you can do this with pivot tables—but it’s manual every time. We’re going to:

  1. Load this Excel file with Python.
  2. Store it in a SQLite database called sales_analytics.db.
  3. Run SQL queries to answer those questions.
  4. Export the answers as clean CSV summaries.

2.1 Don’t have a dataset? Use the demo CSV

If you don’t already have a sales spreadsheet, this tutorial comes with a small demo dataset you can start with. Download it as a CSV file, open it in Excel, then save it as an Excel workbook.

Download the demo sales dataset

Download demo_sales_log.csv

Save it somewhere convenient (for example in a data/ folder), open it in Excel, and use “Save As…” to create sales_log.xlsx next to the script you’ll run.

In the examples below, we’ll assume the demo file lives at data/demo_sales_log.csv and that you’ve saved a copy as sales_log.xlsx alongside the script.

3. Install Python packages

Install the only extra package we need, pandas (for reading Excel and writing CSV):

BASH

pip install pandas openpyxl
    

4. Download the “Excel → Analytics Engine” script

This tutorial comes with a ready-to-run Python script, excel_sales_analytics.py. It:

  • reads your .xlsx file into a DataFrame
  • cleans column names for database use
  • creates (or replaces) a sales table in a SQLite database
  • runs several analytics queries (top customers, monthly revenue, top products)
  • exports each result as a CSV in a summaries/ folder
Download the Python script

Download excel_sales_analytics.py

Save excel_sales_analytics.py into a working folder together with sales_log.xlsx before running the commands below.

5. How to run it

  1. Make sure excel_sales_analytics.py and sales_log.xlsx are in the same folder.
  2. Open a terminal or command prompt in that folder.
  3. Run:
BASH

python excel_sales_analytics.py sales_log.xlsx \
  --db sales_analytics.db \
  --out summaries
    

After it runs, you will have:

  • sales_analytics.db – your new SQLite database
  • summaries/row_count.csv
  • summaries/date_range.csv
  • summaries/revenue_by_customer.csv
  • summaries/monthly_revenue.csv
  • summaries/top_products.csv

Each of those CSVs can be:

  • opened back in Excel for presentation
  • pulled into Power BI / Tableau
  • used as the basis for dashboards or reporting emails

If you’re using the demo dataset, download demo_sales_log.csv, open it in Excel, save it as sales_log.xlsx, and then run the command above.

6. Walkthrough: what the script actually does

6.1 Load and clean the Excel data

First we load the Excel sheet into a DataFrame and normalise column names to snake_case so they work cleanly in SQL.

PYTHON

df = pd.read_excel(excel_path)

df.columns = (
    df.columns
      .str.strip()
      .str.lower()
      .str.replace(" ", "_")
      .str.replace(r"[^0-9a-zA-Z_]", "", regex=True)
)
    

Then we enforce that the sheet actually has the columns we expect (date, customer, product, etc.). Instead of silently failing, the script will tell you what’s missing.

PYTHON

required_cols = {"date", "customer", "region", "product", "quantity", "unit_price"}
missing = required_cols - set(df.columns)
if missing:
    raise ValueError(f"Missing expected columns in Excel file: {missing}")
    

6.2 Store data in SQLite

Instead of leaving the data trapped in Excel, we store it in a structured table called sales inside a SQLite database file.

PYTHON

conn = sqlite3.connect(db_path)
df.to_sql("sales", conn, if_exists="replace", index=False)
conn.commit()
    

The power move here: once the data is in SQLite, anything that can speak SQL can use it —Python scripts, BI tools, even other languages.

6.3 Ask powerful questions with SQL

Now we let SQLite do the heavy lifting. Here are some of the queries embedded in the script:

Example analytics queries (simplified)
Question SQL sketch
How many rows are in my dataset? SELECT COUNT(*) FROM sales;
What date range does this file cover? SELECT MIN(date), MAX(date) FROM sales;
Who are my top 10 customers by revenue? SELECT customer, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY customer ORDER BY revenue DESC LIMIT 10;
What is my revenue per month? SELECT strftime('%Y-%m', date) AS year_month, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY year_month;
Which products drive the most revenue? SELECT product, SUM(quantity) AS units_sold, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY product ORDER BY revenue DESC LIMIT 10;

In Excel, each of these might be a separate pivot table, chart, or formula. In Python + SQLite, they’re just queries you can run over and over, on each new file.

6.4 Export clean summary CSVs

The script doesn’t just print the results; it writes them out into tidy CSV files that you can feed into other systems:

PYTHON

for name, df_result in results.items():
    out_path = out_dir / f"{name}.csv"
    df_result.to_csv(out_path, index=False)
    print(f"[saved] {out_path}")
    

This is where the “engine” part really shows: monthly reports, dashboards, or emails can all be powered by these outputs, with Excel acting purely as the data entry layer.

7. Full script (for reference)

PYTHON

#!/usr/bin/env python3
"""
excel_sales_analytics.py

Turn a simple Excel sales log into a reusable analytics engine using SQLite.

Usage:
    python excel_sales_analytics.py sales_log.xlsx --db sales_analytics.db --out summaries
"""

import argparse
from pathlib import Path
import sqlite3

import pandas as pd


REQUIRED_COLUMNS = {"date", "customer", "region", "product", "quantity", "unit_price"}


def load_and_clean_excel(excel_path: Path) -> pd.DataFrame:
    if not excel_path.exists():
        raise FileNotFoundError(f"Excel file not found: {excel_path}")

    print(f"[info] Loading Excel file: {excel_path}")
    df = pd.read_excel(excel_path)

    # Normalise column names to snake_case
    df.columns = (
        df.columns
        .str.strip()
        .str.lower()
        .str.replace(" ", "_")
        .str.replace(r"[^0-9a-zA-Z_]", "", regex=True)
    )

    missing = REQUIRED_COLUMNS - set(df.columns)
    if missing:
        raise ValueError(
            f"Missing expected columns in Excel file: {missing}\n"
            f"Columns present: {list(df.columns)}"
        )

    print(f"[info] Loaded {len(df)} rows with columns: {list(df.columns)}")
    return df


def write_to_sqlite(df: pd.DataFrame, db_path: Path, table_name: str = "sales") -> sqlite3.Connection:
    if db_path.parent and not db_path.parent.exists():
        db_path.parent.mkdir(parents=True, exist_ok=True)

    print(f"[info] Writing data to SQLite database: {db_path} (table: {table_name})")
    conn = sqlite3.connect(db_path)
    df.to_sql(table_name, conn, if_exists="replace", index=False)
    conn.commit()
    return conn


def run_analytics_queries(conn: sqlite3.Connection) -> dict:
    queries = {
        "row_count": """
            SELECT COUNT(*) AS rows
            FROM sales;
        """,
        "date_range": """
            SELECT MIN(date) AS start_date, MAX(date) AS end_date
            FROM sales;
        """,
        "revenue_by_customer": """
            SELECT
                customer,
                ROUND(SUM(quantity * unit_price), 2) AS revenue
            FROM sales
            GROUP BY customer
            ORDER BY revenue DESC
            LIMIT 10;
        """,
        "monthly_revenue": """
            SELECT
                strftime('%Y-%m', date) AS year_month,
                ROUND(SUM(quantity * unit_price), 2) AS revenue
            FROM sales
            GROUP BY year_month
            ORDER BY year_month;
        """,
        "top_products": """
            SELECT
                product,
                SUM(quantity) AS units_sold,
                ROUND(SUM(quantity * unit_price), 2) AS revenue
            FROM sales
            GROUP BY product
            ORDER BY revenue DESC
            LIMIT 10;
        """,
    }

    results = {}
    for name, sql in queries.items():
        print(f"[info] Running query: {name}")
        results[name] = pd.read_sql_query(sql, conn)

    return results


def save_summaries(results: dict, out_dir: Path) -> None:
    out_dir.mkdir(parents=True, exist_ok=True)
    for name, df_result in results.items():
        out_path = out_dir / f"{name}.csv"
        df_result.to_csv(out_path, index=False)
        print(f"[saved] {out_path}")


def main() -> None:
    parser = argparse.ArgumentParser(
        description="Convert an Excel sales log into a SQLite database and summary analytics CSVs."
    )
    parser.add_argument("excel_file", help="Path to the Excel sales log (.xlsx)")
    parser.add_argument("--db", default="sales_analytics.db", help="Path to the SQLite DB file to create")
    parser.add_argument("--out", default="summaries", help="Directory to write summary CSV files")

    args = parser.parse_args()

    excel_path = Path(args.excel_file)
    db_path = Path(args.db)
    out_dir = Path(args.out)

    df = load_and_clean_excel(excel_path)
    conn = write_to_sqlite(df, db_path)
    try:
        results = run_analytics_queries(conn)
    finally:
        conn.close()

    save_summaries(results, out_dir)
    print("[done] Analytics complete.")


if __name__ == "__main__":
    main()
    

8. Where to go next

  • Add more queries: revenue by region, margin analysis, repeat customers.
  • Schedule the script (cron, Task Scheduler) to run automatically when a new Excel file arrives.
  • Point a BI tool (e.g. Power BI) at sales_analytics.db for live dashboards.
  • Split additional Excel sheets into multiple tables and join them with SQL.

The big shift is mindset: Excel becomes your input UI, while Python + SQLite become your analytics engine—automated, repeatable, and ready to scale.