How can you go from a test of two to two-dozen
elements?
The simple two-variable test
(explained in
how to test many variables at once) reduced sample size
by 50%, quantified the interaction between variables, and
gave more accurate and robust information—all from just one
more test cell and a structured test design.
However, a “full-factorial”
test design of three variables requires twice the total
number of recipes (8 instead of 4). Testing four variables
requires 16 and five variables require 32 recipes. For 2-4
variables these designs can be valuable, but with 5 or more,
we need another option.
Adding more test elements
within the same number of recipes
More advanced scientific test
designs retain the power of full-factorial designs, but with
greater efficiency and fewer recipes. The 16 recipes
required for testing all combinations of 4 variables, can
instead be used to test up to 15 variables all at once.
Note: If
you would like to learn more about the statistics behind
scientific testing, please read on. But if this statistical
stuff starts to confuse you, just skip it—move on and read
some of the real-world
case
studies & articles. With an experienced guide, the
statistical complexity can remain in the background.
OK, if you’re still with us, let’s see how it works…
Let’s start with a 3-element
full-factorial design with three variables each at two
levels:
A full-factorial test of all combinations
gives the following matrix.
This matrix shows the eight recipes you need
to create. Yet statistically, you can also add the four possible
interactions—three 2-way and one 3-way interaction—that can be
analyzed independently of these three main effects. The matrix
below shows all of the effects.
The A, B, and C columns (in blue) are what
your creative team uses to create the eight different e-mails
(or catalogs, ads, store layouts). The AB, AC, BC, and ABC
columns (in orange) are used for analyses. With 8 recipes,
all 7 effects (the 7 columns) can be analyzed separately, so you
can compare every possible combination of elements and
interactions.
All effects are still calculated as the
average of the “+” levels minus the average of the “–” levels in
each column. And averaging the four recipes with a “+” in each
column, all other columns have two pluses and two minuses, so
every other effect averages out (the same as for the 2-element
test).
However, three important statistical
principles offer a way to test more variables:
1. Effect Dissipation - Main
effects tend to be larger than two-factor interactions and
higher-order interactions are very unlikely.
2. Effect Sparcity - Few, if any,
interactions are usually significant.
3. Effect Heredity - Interactions tend
to result from large main effects.
All possible effects are seldom significant. A
few variables may each have a big impact on their own. Changing
two variables together may slightly alter these main effects.
But changing all three together in the just the right
combination will probably have little incremental impact versus
the sum of the main effects and 2-way interactions.
Both the process of defining test elements and
the nature of test elements support these principles. First of
all, we work diligently to define independent test elements, so
they can be changed on their own without affecting other
elements. Therefore, well-defined test elements minimize the
possibility of interactions. Secondly, large interactions are
like the planets aligning—seldom does everything come together
in just the right way to create a completely different outcome.
For the example, above, this means that the
3-way interaction (ABC) is probably very close to zero and at
most one or two 2-way interactions may be significant, but even
those are probably smaller than any main effects.
Back to top
Now here’s a big step towards more advanced
designs that took statisticians about a decade to take:
If an interaction is unlikely, then that
column can often be better used to test another main effect.
Since the +/- combination in each column is
unique, anything in that column can be analyzed independently of
all other columns, so you can add another test element that follows
the same +/- scheme.
For example, even without knowing anything
about this e-mail marketing program, we can assume the ABC 3-way
interaction will be non-existent or very close to zero.
Therefore, we can add a 4th element into that column.
We can test a new element following the same
+/- scheme as the ABC interaction. So along with price, offer,
and copy, we can test a new subject line (in all recipes where
ABC is “+”) versus the control (where ABC is “–”).
This simple change cuts the number of test
recipes in half
Instead of using 16 recipes to test 4
variables (24), we can now use only eight. However,
there is some risk. The 3-way interaction doesn’t just
disappear. It’s now “confounded” with subject line (a fancy
statistical term for “mixed together”).
The calculated effect from that last column is
now the sum of both the main effect of “subject line” and the
ABC interaction. Yet in reality, it’s no big deal, since the
interaction is likely very close to zero. Any small change due
to “confounding error” is usually much less than the normal
experimental error from natural market variation. In other
words, any error in the main effect of “subject line” is likely
small and certainly worth a 50% reduction in the number of test
recipes.
This type of test matrix is called a
“fractional-factorial” design, since it requires just a
fraction of the recipes required for a full-factorial test.
Back to top
Taking this concept a step further, if
interactions tend to be small or non-existent, then why not use
each interaction column for another test element? Let’s do it…
If you place three additional test elements in
the three columns with 2-way interactions, then you can create a
matrix with 7 different variables within the same 8 recipes as the
3-element full-factorial test design:
Testing all combinations of 7 elements would
require 128 recipes (27) versus the eight recipes
above—quite significant savings, especially if you’re testing a
catalog, print ad, TV spot, or direct mail package, where each
recipe adds to your marketing costs. In addition, as for all
scientific tests, sample size is unrelated to the number of test
elements. So sample size for this 7-element test can be just as
small as you would need to test 3 elements (or one new
version against your control).
Confounding spreads a number of interactions
throughout the matrix, but if the three statistical principles
hold—few interactions exist and the interaction effects are
related to and smaller than the significant main effects—then this test design works well.
Creating each test recipe—each version of the
e-mail—now requires attention to all seven columns. For example,
in recipe #1, the price, offer, copy, and subject line are kept
the same as in the control (A, B, C, and G are “–”), but the
starburst, more links, and more products are added (D, E, and F
are set at the “+” level).
Recipe #2 requires A and G to be changed to
the “+” level and D and E to be changed back to the control, and
so on. When test elements are clearly defined, all recipes are
basically cut-and-paste combinations of all the elements. Every
element is present in every recipe, either at the plus or minus
level.
This large test has enormous benefits over
split-run tests:
- The impact of each
element can be accurately quantified independently of all
other effects. Even though many variables are changed
simultaneously, all main effects are independent.
- Sample size is
reduced by 80% versus seven different split-run tests
- Key interactions can
still be analyzed
- The optimal result
is often even greater than the best test recipe,
since only a small portion of all possible combinations are
tested. You can pinpoint the elements to change, those to keep
at the control level, and those that make no difference.
This “fractional-factorial” scientific test
design is just one of many options for testing more variables,
more rapidly, at lower cost, and with more accurate and
profitable results. The next section,
variety of test designs
and techniques, discusses a few important types of test
designs and the best applications of each.
But remember—the
statistics are just a tool. What you test, how you test it, and
what you do with the information you learn is what determines
your long-term success. You need a focused marketing objective,
a good bunch of ideas, and some guidance in finding the best
strategy to maximize your results.
Just as skill using a hammer does not mean you
can build a house, basic knowledge of statistical testing does
not translate into profitable testing. Thousands of details come
together to determine the ultimate outcome. That’s where the
LucidView strategy comes in. With the right statistical methods,
a streamlined approach, and specialized skill integrating
scientific testing within on-going marketing programs, LucidView
consultants can help you hit the ground running.
Now that you understand the basics, glance
over the variety of test designs and techniques, learn
more about managing a scientific test, or look over some
real-world examples in the
case studies
& articles.
Back to top
And for you statisticians…
In reality, you need to be careful with this
approach using fractional-factorial designs. As we saw with the
2-element test, interactions can be significant and valuable.
Therefore, most tests are designed between the extremes—testing
many elements with few recipes, but saving room to analyze the
most-likely interactions.
Also, please note that, as is often the case,
what looks so easy on the surface can have a lot of underlying
complexity. Confounding creates a snowball effect, where
interactions start popping up all over the place. For those of
you who love matrix algebra…
(Anyone left?) OK, when you add a new element,
you also create other interactions. In the above example, when
you confound “G: Subject line” with ABC then you get:
- All main effects confounded with 3-way
interactions
G = ABC
A = BCG
B = ACG
C = ABC
- 2-way interactions confounded with other
2-way interactions
AB = CG
AC = BG
BC = AG
Adding three more elements, for a total of
seven, the confounding scheme becomes: D = AB, E = AC, F = BC, G
= ABC. The overall defining relation then becomes:
I = ABD = ACE = BCDE = BCF = ACDF = ABEF = DEF
= ABCG = CDG = BEG = ADEG = AFG = BDFG = CEFG = ABCDEFG
This is shown not to push you away from
scientific testing forever, but to show how much goes on in the
background of every test. With guidance, you can ignore all of
this stuff, but someone needs to understand it.
Confounding offers immense opportunity to test many elements in
few recipes and a few techniques can help minimize any
confounding error.
Ultimately, large test designs simply codify the complexity
inherent in the marketplace, removing the blinders of opinion
and unproven beliefs.
The immediate illumination of your marketing
programs can be disconcerting at first. It’s so much easier to
keep the blinders on. But the reality exposed in scientific
testing simply mirrors the complex reality of your marketplace.
The answers may not always be what you wanted to see, but they
more accurately reflect the truth.
Every issue to consider in scientific testing
is also an issue that should be considered with simple split-run
tests. But with split-run testing, it’s so easy to overlook
mistakes and the complexity of testing. One data point cannot
show if interactions exist, or if you’re truly measuring what
you wanted to test, or if the results have any relationship with
future performance. Scientific testing forces you to ask the
right questions and create an effective test… or else the
results will show you the error of your ways.
If testing is important, then scientific
testing should be an integral part of your marketing and
advertising programs.
Now learn more about the
variety of
test designs and techniques, or sit back and read a few good
stories in
case studies & articles
.
Back to top
|