2025-11-28

Practical Null Hypothesis Testing in Python

Data, Statistics, Python · PLEX Lab

Null hypothesis testing sounds like something reserved for statisticians, but it is really a simple decision framework you can use every single day. Any time you compare “before vs after”, “A vs B”, or “control vs experiment”, you are implicitly making a hypothesis about whether something changed.

In this article we’ll treat the null hypothesis as a pragmatic tool, not a textbook formula. You’ll see:

What the null hypothesis really means in plain language.
Why it is essential for anyone who works with data.
How to structure everyday questions as hypotheses without doing any calculations yourself.
A complete Python workflow over a small CSV dataset to test whether a change in your system “actually did something”.

Key idea

The null hypothesis is the assumption that nothing special is happening. You only abandon that assumption when the data provides strong evidence against it. Python handles the math; your job is to frame the question and interpret the result.

What is the null hypothesis?

Formally, the null hypothesis, usually written as H₀, states that there is no effect, no difference, or no relationship in the population you care about.

In practice, you can read it as:

“If I assume nothing has changed, could this data just be random noise?”

Everyday translations

Examples of null vs alternative hypotheses

Question	Null hypothesis (H₀)	Alternative (H₁)
Did the new landing page increase conversions?	Conversion rate is the same as before.	Conversion rate is different (usually: higher).
Does a new model reduce processing time?	Average processing time is unchanged.	Average processing time changed.
Do Region A and Region B customers spend differently?	Mean spend is the same in both regions.	Mean spend differs between regions.

We then ask: “If the null were true, how likely is it that we would see data this extreme by chance?” That likelihood is summarised in the p-value.

Low p-value (typically < 0.05) → data is unlikely under the null → we reject H₀.
High p-value (≥ 0.05) → data is compatible with the null → we keep H₀.

Why null hypothesis thinking matters

Even without formulas, thinking in terms of the null helps you:

Resist false patterns. Our brains like stories: “Traffic is up after the redesign, so the new UI works!” The null forces you to ask whether this might simply be random variation.
Avoid cargo-cult changes. If a feature rollout shows no statistically meaningful effect, you may decide not to keep investing in it.
Communicate evidence clearly. Saying “We cannot rule out randomness” is more honest than “It looks better to me”.

The good news is that you do not need to compute anything by hand. Python—and libraries like scipy—will do the heavy lifting. Your main responsibilities are:

Frame the question as a comparison (before vs after, A vs B).
Collect sensible data for each group.
Run one or two lines of statistical code.
Decide how to act based on whether you can reject the null.

The example dataset: before vs after sales

To make this concrete, we will use a small synthetic dataset for a before/after experiment. Imagine you changed the pricing layout on your product page, and you recorded daily sales for 10 days before the change and 10 days after the change.

The CSV structure is:

CSV


day,sales_before,sales_after
1,200,210
2,190,205
3,205,215
4,198,199
5,202,220
6,210,230
7,195,205
8,205,211
9,207,225
10,199,218

Save this as data/null_hypothesis_sales.csv inside your project.

Snippet: null_hypothesis_sales.csv

Day	Sales (before)	Sales (after)
1	200	210
2	190	205
3	205	215
4	198	199
5	202	220
6	210	230
7	195	205
8	205	211
9	207	225
10	199	218

Download the sales experiment dataset

You can download the exact CSV powering the example above as: null_hypothesis_sales.csv

Python workflow: test “before vs after”

Our goal is to test whether the change in your pricing page has truly increased sales, or whether the apparent difference could just be random day-to-day noise.

We frame this as a two-sample problem: one sample for sales_before, one for sales_after. The null hypothesis:

H₀: The mean daily sales before and after the change are equal.

Step 1: Load the data

PYTHON


import pandas as pd

# Adjust the path to match your project layout if needed
df = pd.read_csv("data/null_hypothesis_sales.csv")

print(df.head())
print(df.describe())

Step 2: Visualise before you test

Before running any hypothesis test, get a feel for the data. Plots make anomalies or odd distributions obvious.

PYTHON


import matplotlib.pyplot as plt

plt.plot(df["day"], df["sales_before"], marker="o", label="Before")
plt.plot(df["day"], df["sales_after"], marker="o", label="After")

plt.xlabel("Day")
plt.ylabel("Daily sales")
plt.title("Daily sales before vs after pricing change")
plt.legend()
plt.tight_layout()
plt.show()

Conceptual chart: sales before vs after change

Before After

Download sales data (CSV)

Step 3: Run the hypothesis test

For two independent samples (before vs after) with numeric data, a common choice is the two-sample t-test. This is implemented as scipy.stats.ttest_ind().

PYTHON


from scipy.stats import ttest_ind

before = df["sales_before"]
after = df["sales_after"]

t_stat, p_value = ttest_ind(before, after)

print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")

Step 4: Interpret the p-value

Once you have a p-value, you compare it to a significance level α (alpha). A common default is α = 0.05:

PYTHON


alpha = 0.05

if p_value < alpha:
    print("Reject the null hypothesis — sales changed significantly.")
else:
    print("Fail to reject the null hypothesis — difference may be random.")

You don’t need to remember how t-tests are derived. The decision logic is all you need most of the time:

p < 0.05: the observed difference is unlikely under H₀ → treat the change as real.
p ≥ 0.05: the data are compatible with H₀ → you cannot rule out randomness.

Using null hypothesis thinking day-to-day

Even if you rarely write the Python yourself, this mental model changes how you discuss experiments and metrics with others. A simple checklist:

Always ask: what is the null hypothesis for this decision?
Check sample sizes: do we have enough data for the comparison to be meaningful?
Demand a p-value or equivalent: is this difference statistically supported?
Separate size from significance: is the effect big enough to matter operationally, even if it is “real”?

You can apply this to marketing experiments, operational tuning, UX changes, pricing tests, and even internal process changes (like switching tooling or workflow).

Downloads and methodology

Download the assets for this article

The following files can be exposed under static/ or data/ and wired into your Flask app via download_data():

null_hypothesis_sales.csv — sample before/after dataset used in the examples.
null_hypothesis_tutorial.py — optional helper script that loads the CSV, runs the t-test, and prints a human-readable interpretation.

Example methodology

Source: Synthetic data generated by PLEX Lab for demonstration purposes.
Structure: Daily aggregates of sales volume before and after a hypothetical product page change.
Processing: CSV file loaded into Python via pandas, analysed using scipy.stats.ttest_ind.
Visualisation: Line charts created with matplotlib; static SVG used in this article as a conceptual placeholder.
Integrity: Values are generated programmatically and not hand-edited. You can regenerate or extend the dataset by following the structure shown in the CSV snippet.

Final checklist for using null hypothesis tests

Copyable checklist

Keep this near your notebook or in your repo when you’re evaluating experiments or A/B tests. Use the button to copy the whole checklist at once.

CHECKLIST


NULL HYPOTHESIS TESTING CHECKLIST

1. Write down H₀ explicitly ("nothing changed").
2. Collect data for two groups (before vs after, A vs B).
3. Plot the data to check for obvious issues or outliers.
4. Run a simple test like ttest_ind() in Python.
5. Compare the p-value to your chosen alpha (e.g. 0.05).
6. Decide what to do next: roll out, roll back, or keep testing.

You do not need to memorise formulas to benefit from statistical thinking. By combining a clear null hypothesis, a small CSV file, and a few lines of Python, you can make decisions that are less biased, more robust, and easier to explain to anyone who asks, “How do you know this really works?”

Day	Sales (before)	Sales (after)
1	200	210
2	190	205
3	205	215
4	198	199
5	202	220
6	210	230
7	195	205
8	205	211
9	207	225
10	199	218

Day	Sales (before)	Sales (after)
1	200	210
2	190	205
3	205	215
4	198	199
5	202	220
6	210	230
7	195	205
8	205	211
9	207	225
10	199	218

Day	Sales (before)	Sales (after)
1	200	210
2	190	205
3	205	215
4	198	199
5	202	220
6	210	230
7	195	205
8	205	211
9	207	225
10	199	218