AB Testing

Posted Nov. 24, 2025, 2:31 a.m. by andyadmin
Portfolio
...

This A/B test explores whether a redesigned sign-up button improves conversions. I validated the setup with an AA test, determined the required sample size, executed a two-week experiment, and computed the final p-value using chi-square proportion testing. Results show a clear, statistically significant improvement from the new design.

๐Ÿงช A/B Testing Case Study: Measuring the Impact of a New Sign-Up Button Design

This project demonstrates a full end-to-end A/B testing workflow using real user-behavior logs. It covers hypothesis design, power analysis, AA validation, sample-size determination, and statistical testing.

The goal is to determine whether a new green sign-up button improves conversion compared to the existing blue button.

A/B Testing Presentation


Step 1 — Understanding the Data

We begin with 31 days of traffic data from December 2021. Daily visits average around 10,000 per day, totaling roughly 310,000 visits.

The experiment includes two variants:

  • ๐ŸŸฆ Control: Original blue sign-up button

  • ๐ŸŸฉ Treatment: New green sign-up button

Key considerations when designing the test:

  • Group allocation and daily unique visitors

  • Baseline conversion rates

  • Required sample size to detect meaningful differences


Step 2 — State the Hypotheses

Null Hypothesis (Hโ‚€)
The sign-up rates of the blue and green buttons are the same.

Alternative Hypothesis (Hโ‚)
The sign-up rates of the blue and green buttons are different.

Experiment parameters

  • Significance level (α): 0.05

  • Target power: 0.80

  • Minimum Detectable Effect (MDE): 1% absolute lift

  • Test type: Two-sample, two-tailed


Step 3 — Power Analysis & Sample Size Determination

We assume the sign-up rate increases from 10% → 11%. The effect size is computed using the pooled standard deviation:

  • pโ‚ = 0.10 (control rate)

  • pโ‚‚ = 0.11 (treatment rate)

Pooled standard deviation:

$$ s_{\text{pooled}} = \sqrt{\frac{p_1(1 - p_1) + p_2(1 - p_2)}{2}} $$

Effect size (Cohen's d):

$$ d = \frac{p_2 - p_1}{s_{\text{pooled}}} $$

Using the effect size d and the noncentral t distribution, we compute power across different sample sizes and find the smallest n that achieves about 80% power.

Result

  • Required sample per group ≈ 15,000

  • Total required sample ≈ 30,000 visitors


Step 4 — Experiment Duration

Daily unique visitors ≈ 10,000. If we allocate around 20–30% of traffic to the experiment, we get:

  • ≈ 2,100+ visitors per day entering the experiment

  • ≈ 15,000 visitors per group over 14 days

Chosen experiment duration: 14 days.

This duration balances:

  • Statistical reliability

  • Coverage of weekly traffic patterns

  • Reasonable time to deliver insights to the product team


Step 5 — AA Test Validation (Randomization Check)

Before analyzing the A/B test, an AA test is performed to validate:

  • Randomization

  • Identity consistency

  • Correct instrumentation

AA Test Key Findings

  • โœ” Perfect group labels detected (0 and 1)

  • โœ” Balanced split: 14,982 (Group 0) vs 15,057 (Group 1)

  • โš  ~0.66% duplicate emails — normal for high-traffic logs

  • โš  ~0.66% emails mapping to multiple visitor IDs — worth monitoring

  • โœ” No invalid rows

  • โœ” Chi-square p ≈ 0.459 → Fail to reject Hโ‚€ → AA test PASSED

The AA test confirms that sampling and randomization are healthy.
It is safe to proceed with the A/B test.


Step 6 — A/B Test Statistical Results

Experiment period: 2022-02-01 → 2022-02-14

Group sizes

  • Control: 14,942 visitors

  • Treatment: 15,139 visitors

Conversion outcomes

  • Control: 1,428 success, 13,514 fail → 10% conversion

  • Treatment: 1,632 success, 13,507 fail → 11% conversion

A Chi-Square test on the 2×2 contingency table yields:

  • P-value = 0.0005

Since p < 0.05, we reject Hโ‚€ and conclude that the treatment provides a statistically significant improvement.

Interpretation

  • The green button delivers a real lift of about 1% absolute.

  • The lift is statistically significant and operationally meaningful.

  • The treatment is a strong candidate for rollout.


Final Conclusion

The green sign-up button should be adopted.

This decision is supported by:

  • Clear hypotheses and experimental design

  • Correct sample-size planning via power analysis

  • 14-day duration ensuring traffic stability

  • AA test confirming data integrity

  • Statistically significant result (p = 0.0005)

This full pipeline reflects standard experimentation strategy used by modern data-driven companies.

Follow Me

  1. GitHub
  2. LinkedIn