InterviewBiz LogoInterviewBiz
← Back
What Is Hypothesis Testing and How Do You Interpret p-Values?
data-sciencemedium

What Is Hypothesis Testing and How Do You Interpret p-Values?

MediumCommonMajor: data sciencedeloitte, pwc

Concept

Hypothesis testing is a cornerstone of inferential statistics — it helps determine whether patterns observed in sample data are statistically significant or likely due to random chance.
It provides a structured framework for decision-making under uncertainty, widely used in data science, A/B testing, and scientific research.


1. The Statistical Framework

At its core, hypothesis testing compares two competing statements about a population parameter:

  • Null Hypothesis (H₀): There is no effect or difference.
    Example: “The new marketing campaign does not change conversion rate.”

  • Alternative Hypothesis (H₁ or Hₐ): There is an effect or difference.
    Example: “The new campaign increases conversion rate.”

These hypotheses form the foundation of all statistical tests — the goal is to determine whether observed evidence is strong enough to reject H₀.


2. The General Procedure

Step 1: Define the Hypotheses

Formulate:


H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂

Depending on context, H₁ can be:

  • Two-tailed: tests for any difference.
  • One-tailed: tests for a specific direction (e.g., μ₁ > μ₂).

Step 2: Choose a Significance Level (α)

  • Commonly set at 0.05 (5%).
  • Represents the threshold for the probability of making a Type I error (rejecting a true null hypothesis).

Step 3: Select the Appropriate Test Statistic

Depends on data type and assumptions:

TestUse CaseDistribution
Z-testKnown population variance, large samplesNormal
t-testUnknown variance, small samplesStudent’s t
χ²-testCategorical dataChi-square
ANOVAComparing >2 meansF-distribution

Compute a test statistic that quantifies how far the observed sample deviates from the null expectation.

Step 4: Compute the p-Value

The p-value measures the probability of obtaining results as extreme as those observed, assuming H₀ is true.

Interpretation:

  • Low p-value (< α): Evidence against H₀ — reject the null.
  • High p-value (≥ α): Insufficient evidence — fail to reject H₀.

Example:

If p = 0.02 and α = 0.05, the likelihood of observing this data (or more extreme) under H₀ is 2%. Thus, we reject H₀.


3. Common Misinterpretations

MisinterpretationCorrection
“A p-value of 0.05 means there’s a 5% chance H₀ is true.”❌ Incorrect. The p-value is the probability of observing the data given H₀, not the other way around.
“A non-significant result proves H₀ is true.”❌ It only indicates insufficient evidence to reject H₀.
“Smaller p-value always means larger effect.”❌ It depends on sample size — even small effects can be significant with large N.
“Statistical significance implies practical importance.”❌ Always interpret results in context — practical or business impact matters.

4. Effect Sizes and Confidence Intervals

To complement p-values:

  • Effect size (Cohen’s d, odds ratio): Measures magnitude of difference, not just existence.
  • Confidence intervals (CIs): Provide range estimates for population parameters.
    • If CI excludes the null value (e.g., 0 or 1), the result is statistically significant.

This combination provides a more nuanced understanding than p-values alone.


5. Example Use Cases

A. A/B Testing

  • H₀: Conversion rate (A) = Conversion rate (B)
  • H₁: Conversion rate (A) ≠ Conversion rate (B)
    A z-test or chi-square test assesses if observed difference is statistically significant.
    If p < 0.05, the new variant likely performs differently.

B. Clinical Trials

Used to test whether a new treatment outperforms a control.
Example: “Drug B reduces blood pressure more than placebo.”
T-tests or ANOVA validate effects before regulatory approval.

C. Survey or Poll Analysis

Chi-square tests evaluate independence between demographic factors and survey responses.


6. Real-World Perspective

Hypothesis testing enables evidence-based decision-making:

  • Data Science: Feature selection, algorithm benchmarking.
  • Business: Marketing A/B experiments, pricing decisions.
  • Science: Testing causal relationships under controlled uncertainty.

However, overreliance on p-values without considering context can lead to “p-hacking” or misinformed conclusions — hence the growing emphasis on Bayesian inference and reproducibility.


7. Best Practices

  • Always define hypotheses before viewing data (to avoid bias).
  • Report both p-values and effect sizes.
  • Use multiple-testing corrections (Bonferroni, FDR) when running many tests.
  • Visualize results — e.g., boxplots or confidence interval charts.
  • Consider practical significance alongside statistical significance.

Tips for Application

  • When to discuss:
    In any interview question on experimental design, data analytics, or inferential reasoning.

  • Interview Tip:
    Combine rigor with intuition:

    “In our A/B experiment, the p-value was 0.03 (α = 0.05), so we rejected H₀.
    However, the effect size was small (Cohen’s d = 0.15), meaning the improvement was statistically significant but not practically impactful.”


Key takeaway:
Hypothesis testing is a decision-making framework under uncertainty, and p-values quantify surprise, not truth.
Effective data scientists interpret them in context, balancing statistical significance with practical relevance.