Home | About us | LucidViewTM Strategy | Scientific testing | Case studies & articles | Services | FAQs | Contact us

 



Sample size

Scientific testing

Introduction

How to test many variables at once

Creating large test designs

Variety of test designs and techniques

Managing a scientific test

Sample size

Sample size is an important issue in marketing testing because it has such a large impact on the validity of your results and is so often misunderstood. Without sufficient sample size (i.e., sufficient data), your test will be meaningless—little more than a guess. Greater sample size increases your confidence that any change in response or sales is a real difference and not just random chance.

Calculating Sample Size

Sample size should be based on a equation, not a rule-of-thumb. General rules like “each test cell should have at least 100 orders” oversimplifies the issue and often results in a weak test with few significant results.

Different equations are appropriate for different types of data, but all sample size calculations are based on the general equation at top of the page. Sample size, N, should be based on:

1. How much variation there is in your data, measured in “standard deviations” (with the symbol, σ, sigma)

2. How small an effect you want to be able to see (the change in response rate for the test versus the control)

Two standard deviations are an important measure in separating significant market changes from natural variation. Statistically, if nothing has changed, about 95% of results will fall within ±2σ of the average. So an increase (or decrease) beyond 2σ means something has changed—something that you should be able to identify. Also, for response rate (and other yes/no data), the standard deviation is proportional to the average—a higher response rate has a lower relative standard deviation.

Generally, the smaller the change you want to see and/or the lower your response rate, the larger the sample size you need.

For example:

  • If response rate for direct mail campaigns is usually about 1%, you’ll need three times the sample size as someone who normally sees a 3% response.
  • If you want to see any variable that increases response by 5% (e.g., from 1% to 1.05%), you’ll need four times the sample size as you would need to see a 10% or larger effect.

 

Sample Size for Scientific Testing
(no matter how many variables are tested)

Sample size is a big issue in marketing and advertising testing, making it difficult to test very many variables or see any but the largest effects. Here’s where scientific testing has an immense advantage. With the right techniques, you can test any number of variables at once with the same sample size.

If all the variables are a part of the same statistical test design, the sample size gives equal statistical confidence whether you’re testing two or two-dozen variables. For a scientific test measuring response rate, use the sample size equation:

Though this looks a bit complex, it’s similar to standard equations (but more accurate) and is fairly easy to calculate if you can answer three questions:

1.  What is your average response rate?
2.  How small an effect would you like to be able to see?
3.  How confident do you want to be that you'll see an effect of that size?
 

The symbols and terms include –

N = overall sample size
(equally divided among all recipes, no matter how many elements or recipes are tested)

p = historical or expected average response rate
(actually “p-bar”, in the equation the bar overtop means “average”)

effect = how large a change you want to be able to detect
(e.g., the difference in response rate between the test and control)

tα/2 = a fancy statistical way of saying “about 2” standard deviations

  • It’s perfectly fine to use 2, since tα/2=1.96 for almost every calculation (With 95% confidence, meaning there’s only a 5% chance that a significant effect is really just random chance)
  • Always use alpha=5% for tests, confidence levels less than 95% (alpha>5%) are just an excuse for using too small a sample size

tβ = how confident you want to be in seeing the selected effect (beta error)

  • tβ = 0, for a 50-50 chance of seeing the "effect" of the chosen size
  • tβ = 0.674 for only a 25% chance of missing the effect (75% power)
  • tβ = 0.841 for only a 20% chance of missing the effect (80% power)


Alpha and Beta: two types of error

One focus of statistics is reducing error so you can make the right decisions. Two types of error are important to consider before every test:

  • “Alpha” is your chance of seeing a significant effect that really doesn’t exist
  • “Beta” is your chance of missing an effect that should be significant

The “beta” term is often not used for sample size calculations, but it’s always a part of the mix. Leaving it out is the same as setting tβ=0, which means that you have only a 50-50 chance of seeing the desired effect. (Note: If you remove the “4” and “+tβ“, you get the commonly-used, usually-wrong, sample size calculation for one test cell).

This equation gives you a sample size about 4-times what you get with more common calculations:

(a) Because this equation combines sample size for both the “test” and “control” versions—since the control is not separated out in scientific testing—change the 4 to a 2 to get the best equation for one test cell in a split-run test.

(b) Unless you have just one test cell versus a large control cell, the common equation—without the 4—is wrong (half the true sample size you need). You need a 2 in the equation because you have two groups of data—with two sources of variation—that you compare for every test (when the control cell is very large, its variation gets very small, so eliminating the 2 is only appropriate with one test cell versus a large control cell).

Back to top

Try out the equation with the following two examples (and e-mail us if you would like to get a sample size calculator in Excel):

  • A credit card marketer was planning a scientific test of 19 direct mail elements using a 20-recipe test design. The control response rate was only 0.5% and they wanted to see any elements that increased response by 10%, or 0.05 percentage points. With such a small response rate, they calculated that the test would need a sample size of 305,791 for a 50-50 chance of seeing a 10% increase and 624,510 to be 80% confident in seeing a 10% lift [p=0.005, tα/2=1.96, tβ=0 and 0.841, effect=0.0005, solving for N].

Now remember, these sample size calculations are for the overall test—all 19 elements and all 20 recipes can be tested with a total of 305,791 names, or just about 15,290 names per recipe. Split-run tests of all 19 elements would require half that number for each test cell, plus at least four times that for the control, totaling more than 3.5 million names for equal confidence using split-run tests!

  • With a list of only 35,000 e-mail addresses across three customer segments, a conversion rate of 1%, and 12 new ideas she wanted to test, a marketing director wanted to calculate how large an effect she could expect to see [N=35,000, p=0.01, tα/2=1.96, tβ=0 and then 0.841, solving for “effect”]. She calculated a 50-50 chance of seeing any effect that increased conversion rate by at least 20.8% (0.208 percentage points) and an 80% chance of seeing effects that increased (or decreased) conversion by 29.8%.

Even with a small e-mail list and not as much power as she would like, the marketing director went ahead with the test and ended up seeing four significant main effects and one significant interaction, which together increasing conversion rate 54% (details are explained in the E-mail case study).

The full sample size equation, above, is all you need. However, you can change it slightly if…

  • If you are running a few split-run tests, you can use one-half the calculated sample size for each test cell.
     
  • If you are running one test cell against a very-large control, then you can use one-fourth of the calculated number for the test cell.

Note: you may see some equations with a term for the “total universe” of names. Since you never know the full universe of potential buyers, this term should not be included. Also, for you statistical gurus, the z value should be used instead of the t value, but with large sample size, z, t, and normal distributions are about the same.

 

Sample Size for Sales Data

One more equation is important. The above equation is used for response rate, conversion rate, and other types of yes/no data. For retail and other sales data—dollar sales, average order size, percent change in sales versus baseline—the term for “standard deviation” is different, so you need the equation:

Unlike response data, sales data has no fixed relationship between the average and standard deviation. So p*(1-p) is replaced by the standard deviation squared, σ2. The standard deviation must be calculated from all of the individual sales numbers. For example, if five people buy clothes from your website and each order is: $54, $20, $160, $95, and $76, then the average order size is $81.00 and the standard deviation (using any calculator or spreadsheet) is $52.23.

Unfortunately, most companies—including some of the biggest database service providers—measure average sales data without providing the standard deviation. Without “sigma,” you have no measure of variation and can do little statistical analysis. If one catalog has an average order size of $75 and another $65, are these statistically different? Who knows… unless you calculate the standard deviation of all the individual orders.

 

And a final note on sample size…

We find that most marketers often test with too small a sample size. Using the accurate sample size equation, the “100 orders” rule-of-thumb means there’s a 50-50 chance that a test cell will not be statistically-significant unless it increases response rate by about 27%. For an 80% chance of seeing the effect, it must increase response by 39% or more! These are big hurdles to overcome.

Natural variation in the marketplace remains a big challenge for marketing and testing. If you don't believe it, here’s a good “test” to try: take your control, give it five different keycodes, and split your list into five random groups. Measure response for each of the five controls. Since each mailing is exactly the same, these five data points give you a sense of how much variation you can expect to see. The results may surprise you.
 

Contact us for more information or if you would like a sample size calculator in Excel. Next, you can learn more about real-world case studies and articles showing the power of cutting-edge testing techniques.

Back to top

 

© LucidView 2007. All rights reserved. Contact: 888-LucidView (888-582-4384), info@lucidview.com