Basic marketing testing can seem so simple...
There is no “test design”
when each variable is simply another test cell run against
the control. But this simple approach has some major
drawbacks:
-
Few variables can be
tested
-
Sample size can be
enormous for a few statistically-valid tests
-
Results are difficult to
compare and interactions are impossible to see
-
Testing is slow and
inefficient
Scientific testing increases
your speed, power, accuracy, and insights by combining
variables into one statistical test design. Where common
techniques result in a random collection of unrelated tests,
LucidView’s scientific tests create a unified strategy for
more effective testing.
What is a “designed” test?
Scientific test designs, also
called matrix tests, let you change many variables at once
and still isolate the impact of each. With a structured test
design, you still have a number of test cells, or “recipes,”
but each recipe contains a unique combination of all your
test elements. Instead of one test cell providing one
data point on one variable, each test cell provides unique
information on every variable in the test. When the recipes
are combined together, you get a greater depth and variety
of data about the impact of each variable on its own and in
combination with others. Plus, since each recipe is part of
the larger test design, tests are far more efficient, so
sample size for each recipe can be very small.
How can you test many
variables at once?
The “scientific method” was
drilled into us since elementary school… For a valid test
you hold everything constant and change just one variable at
a time.
The key to the scientific
method is that you must be able to isolate the effect of the
variables under study in order to prove a cause-and-effect
relationship. If a lot of things change together or
inconsistently, then the effect of each variable cannot be
isolated. Back-end statistical analyses can show a
correlation between variables, but cause-and-effect proof
comes only from a careful test design.
However, changing one
variable at a time is not the only way to test…
With the right techniques, you can change many variables at
once and still isolate the impact of each.
How? Let’s take it
step-by-step…
Say you want to test two
elements of your e-mail:
(A) Price: the current $24.99 price versus $29.99
(B) Offer: the current 10% discount versus a free gift
You can test these two
elements easily enough as two separate split-run tests with
three different e-mails:
1. Control ($24.99 + 10% off)
2. Higher price ($29.99 + 10% off)
3. New offer ($24.99 + free gift)
If your conversion rate
averages about 2% and you have a total list of 400,000
names, you might randomly select 200,000 people to receive
the control e-mail and 100,000 to receive each of the test
cells. As two separate split-run tests, you can calculate
the impact of each variable on its own, the “main effect” of
(A) Price and (B) Offer. With these sample sizes, you have a
50-50 chance of seeing a statistically-significant effect if
either variable increases conversion rate by 6.1% or more
(from a 2% to 2.12%; see
sample size for an
explanation of the equation).
For a more efficient and
powerful test, you can combine both variables in the same
scientific test design with just one more test cell, where
both A and B are at the test level:
4. Higher
price and new offer ($29.99 + free gift)
Statistically, this one
additional test cell instantly gives you much more
information at a lower cost. Now you have a balanced test
design where every test cell gives you an additional piece
of data about every variable.
Let’s rewrite the new test
design as a matrix of the two variables, or “test elements,”
and the four test cells, or “recipes,” using a “–” (or –1)
to represent the control and a “+” (or +1) for the new
level, as shown below:
The first three
recipes are just like the control and two splits. The 4th recipe
provides unique information on the combination of both new
levels together. In addition, now two recipes, 2 and 4, give
information about the higher price (A+) and both recipes 3 and 4
give information about the free gift offer (B+), so all four
recipes can be used to compare both the price and offer change.
Back to top
In a scientific test, every recipe provides
additional information about every test element, since half the
recipes have each element at the test level and half have each
element at the control level (but a different half and half for
each variable).
The perfect balance of the test matrix lets
you separate out the main effect of each test element. For
the two A+ recipes (2 and 4), you have one B+ and one B-.
Statistically, when you average recipes 2 and 4, the effect of B
“averages out,” so you end up with A+, the high price,
independent of the impact of changing the offer. The same is
true when averaging both A- recipes (1 and 3).
Similar to a split-run test, the effect of A
is simply the test average versus the control average, or, in
this case, the average of recipes 2 and 4 versus the average of
recipes 1 and 3. The effect of B is also calculated as the
average of all + recipes (3, 4) minus the average of all –
recipes (1, 2). This means that, statistically, …
You can use the same sample size no matter
how many elements and recipes are in the test.
Whatever sample size you need can be split
evenly among all recipes. Whether you have 2 or 22 elements, 4
or 24 recipes, half of all recipes are “+” and half are “–” so
all recipes are used to calculate every effect.
Scientific tests also provide information
about interactions between test elements.
Unlike split-run tests, recipe #4 shows the
impact of changing both elements together. You can compare this
combined effect with each main effect to see if the impact of
each element changes depending on how the other elements are
set, creating an interaction, like:
- The interaction between
short-term and long-term interest rates in a credit card
test showed the marketing team how they could increase
profitability without hurting response.
- A direct mail test
showed that certain variables had a much greater impact if
they could be seen through the envelope window.
- An Internet test showed
how two graphics had a greater impact if both were moved to
new locations; moving just one or the other had little
effect.
Analyzing interactions between test
elements
Let’s expand this example to show the
interaction and conversion data:
Each row—recipes 1 through 4—shows the four
versions of the e-mail that were sent to 100,000 randomly
selected names. Each is tagged so every purchase can be tracked
to the correct version of the e-mail.
The AB column represents the interaction: the
incremental change in conversion rate when both elements are
changed together. In other words, how much each main effect may
change, depending on how the other element is set. The AB column
is used only for the calculation of the interaction effect. Each
plus and minus in this column is calculated by multiplying the
signs in columns A and B. For example, (-1)x(-1)=(+1) in recipe
#1 and (+1)x(-1)=(-1) for recipe #2.
The effects shows the main effect of Price (A)
and Offer (B) and the interaction effect from changing both
elements together. All effects are calculated as the average of
all “+” levels minus the average of all “–” levels. The negative
main effect of A means that the control price of $24.99 results
in a 0.7 percentage point average increase in conversion rate
over the higher price (+ level). With a positive main effect of
B, the new offer (free gift) has a 0.9 percentage point higher
conversion rate than the 10% discount offer.
These main effects are comparative. The change
in offer has a greater impact than the change in price (0.9%
versus 0.7%).
The interaction effect is more difficult to
interpret. The AB interaction effect of +0.5% means that
conversion rate can be 0.5 percentage points higher or lower,
depending on how A and B are set. A simple graph can be helpful.
In this chart, the top line shows both recipes
with the free gift (B+) and the bottom line is recipes with the
discount (B-). The left side shows both recipes with the lower
price (A-) and the right points are recipes at the higher price
(A+). An interaction is present when the lines are not parallel.
Looking at this plot, the lower price and free
gift pulls best (upper left), but the higher price leads to a
negligible drop in response when the free gift is offered (upper
right). Therefore, the team should ask, “Is the $29.99 price
more profitable if we offer a free gift?”
Back to top
Calculating the optimal combination of elements
In this case, where all combinations of the elements are tested,
the optimal combination is simply the recipe with the largest
conversion rate. However, most scientific tests do not test all
combinations, so the optimal combination is often much better
than the best of all the test recipes!
The optimal combination of elements can also
be calculated with the equation:
This equation means that the predicted average
conversion rate = the overall average for the test + one-half of
each significant effect x the selected level for each
So the equation becomes:
y = 3.45% + ½
[(-0.7%)(-1)+(+0.9%)(+1)+(+0.5%)(-1)(+1)] = 4.0%
This basically says that we choose A- (the low
$24.99 price) and B+ (the free gift), but the interaction
detracts from the sum of the main effects. With A- and B+ times
a positive AB interaction, the overall result is –0.5% less than
just adding both main effects together. Like the graph shows:
- With the free gift (top line), the
impact of price is not too large, or…
- With the lower price (points on left),
the free gift does not increase conversion rate as much
Even this simple test shows the
advantages of a “designed” multivariable scientific test.
Compared to common split-run techniques, this two-variable test
required one more test cell (recipe 4), but results were more
powerful, in-depth, and accurate:
#1: Sample size is reduced by 50%
- With both variables in the same
statistical test design, all recipes are used to calculate
all effects, so the per-recipe sample size can be very small
- Effects in a split-run test would have to
be about 50% larger to be statistically significant
#2: Interactions show how the effect of
variables can change
- It’s impossible to analyze interactions
with split-run tests
- The interaction clearly shows the
relationship between variables—valuable information on the
fluid impact of marketing-mix elements
- Interactions often present new
opportunities, like in this case, where the price increase
may be more profitable with the free gift offer
#3: Main effects and interactions provide
more accurate real-world data
- Split-run tests of price and offer (using
just the first three recipes here) would have shown that the
higher price reduced conversion 1.2% (recipe 2 versus 1) and
the free gift increased conversion just 0.4% (recipe 3
versus 1). The split-run conclusions are nearly opposite
what we see when the interaction is analyzed as well: the
free gift is much more beneficial and the higher price may
have minimal impact if the free gift is offered.
- Main effects are more robust, averaged
across changes in the other elements (often split-run
results are contingent upon variables not tested).
- Main effects are comparative, so you can
see which changes are most important and how all the effects
compare.
- Interaction effects add greater accuracy
to the calculation of expected results.
With such significant benefits with a simple
two-variable test, just imagine what you can gain from testing
5, 10, or 20 variables all at once in a scientific test. For
example, a scientific test of 20 elements requires just 8% of
the sample size required for split-run tests!
Most every scientific test is based on the
same principles as the two-element test, above. One big
difference is that larger tests use only a small fraction of all
possible combinations of test elements. For example, there are
32,768 possible combinations of 15 elements at two levels each. You can test all possible combinations if you would like,
or you can test all 15 with as few as 16 recipes.
The two-variable test, above, is called a
“full-factorial” design, testing all combinations of the
selected variables. The drawback of this approach is that every
additional variable doubles the number of recipes required for
the test. Fortunately, advanced statistical techniques offer
other test designs with the same advantages, yet far fewer
recipes than full-factorial tests.
If you want to learn more about the statistics
behind large test designs, read over
creating large test
designs. You can also look at the
variety of test designs
and techniques to learn more about the wide array of options
for scientific testing. If you don’t care for all the theory and
statistics, read over some
case studies
& articles to see how marketers have successfully used
scientific testing in fast-moving, real-world marketing programs.
Back to top
|