A wide array of designs lets you create
the best test for your objectives
Whether testing a new merchandise mix, many
different price points and offers, dozens of creative changes,
or numerous marketing-mix elements as quickly as possible,
scientific testing and the LucidView strategy offer a range of
different test designs. Each has unique requirements and
advantages, so it’s important to get some guidance. The major
classes of test designs are summarized below.
In selecting the most appropriate test design,
you should:
1. Let your marketing objective drive the
decision – find a design to give you the answers you want.
2. Create a testing plan – a
screening-refining test cycle over a series of campaigns
throughout the year.
3. Simple is better – use a powerful
scientific design, but avoid unnecessary complexity.
4. Increase your focus over time – understand
the big picture before digging into the details (cast a wide net
for new ideas and find out what’s important before wasting
resources splitting hairs).
Academic tomes cover various test designs in
detail, but here’s a basic overview of most of your options:
Generally, the strategy for marketing and
advertising testing usually entails:
1. Test many elements in a large
"screening" design to quickly identify the important variables
to change or test further
- Fractional-factorial (if up to 15 test
elements) – see the E-mail case study for an example
- Plackett-Burman design (if 16 or more
test elements) – see the Direct mail subscriptions screening
test
2. Test a few important variables from
screening in a smaller "refining" test
- Full-factorial test (of 2-5 elements) –
as in the DM Subscriptions refining test or first example in
Beyond the A/B Split
- Fractional-factorial test (of about 5-7
elements)
3. Once you know which variables have the
greatest impact on sales, run smaller tests of 2-5 elements
while continuing with larger tests. This lets you
continually optimize important elements, identify market shifts,
and never stop searching for new opportunities.
For example, you can run a separate
price/offer test of two or more levels for each element, using
a:
- Fractional-factorial test design
- Full-factorial design – possibly with
advanced techniques for multiple levels
– Centerpoints
– Box-Behnken design
– Central Composite design
4. Use split-run test to “try out” a new
product, vehicle, program, or marketing channel. Then, if
the new version gives OK results, then run a scientific test to
rapidly optimize the program (starting with #1, above).
And now for more details…
At first glance, all these test designs may
seem fairly simple. However, each has numerous technical issues
and considerations that impact your decision. For example, an
expert should understand:
- The pros and cons of testing 15 elements
in a 16-recipe fractional-factorial versus a 20-recipe
Plackett-Burman design
- Where multiple levels may be better than
two-level designs
- Why a 24-recipe retail test may be better
than a 12-recipe test
- When an e-mail and landing page should be
tested together or separately
Plus understand how the marketing channel
impacts test design; like differences between direct mail and
retail channels, like:
- Selection of test elements (broader scope
for direct mail, yet narrow scope for retail)
- Minimum acceptable resolution (III for
direct mail, but IV for retail)
- Sample size (why replication and
stability are an issue for retail but not direct mail)
- Cost of testing (direct mail production
versus retail test execution)
Additional techniques and issues are important
beyond the choice of a type of test design, including:
alternatives for testing multiple levels and segmentation
variables, the test design’s resolution and confounding scheme,
and ways to increase statistical power without increasing the
cost of testing.
Back to top
Testing multiple
levels
The most efficient
way to test many variables at once is with two levels for each,
usually the control level and a bold change that someone thinks
will increase sales or profitability. But 3 or more levels are
fine to test, if the value of multiple levels is worth the cost
of running additional test recipes. Testing 3 elements each at 3
levels requires 27 recipes in a full-factorial design (33),
versus 8 recipes for testing 3 elements at 2 levels each (23).
But a few novel techniques are available—in
addition to using centerpoints, Box-Behnken, or central
composite designs—that let you test multiple levels with few
additional recipes:
- For one 3-level test element, you can use
a standard 2-level design plus one additional one-variable
recipe (using more recipes for additional levels and
analyzing this variable as a series of split-run tests)
- For one 4-level test element, you can use
three columns in the fractional-factorial test design
- To test 2-4 elements at 3+ levels, create
a separate multi-level test design apart from all other
2-level elements
Testing customer/segmentation variables
along with creative elements
Testing is not a substitute for customer
modeling and segmentation analysis, but a number of customer
segments or variables can be tested at once. A few alternatives
for testing marketing programs across different segments
include:
1. Run the full test design within each
customer group.
If you have a few customer segments and 19
elements to test, you can run the same 20-recipe test design
within each segment. This allows you to analyze each segment
alone, plus all together for a larger overall sample size.
2. Add “customer segment” into the design as a
separate test element.
If your available sample size is small,
this approach lets you use your full customer database for
one test, but is best if you expect all segments to respond
similarly to each element in the test (see the E-mail case
study for an example of a 4-level segmentation variable).
3. Run a screening test in 1-2 segments,
followed by a refining test in others.
If you can select a couple of
representative customer segments or marketing programs for a
large screening test, then you can test the significant
elements in a smaller test design across more segments.
4. Create a “double” test design with segment
variables tested on top of a standard design of creative
elements.
Called a “segmentation-crossed” design,
this approach lets you test a few customer
characteristics—in a full or fractional-factorial design—at
the same time as you test marketing program elements, where
each program recipe is tested across the full segmentation
test design. This requires numerous key-codes or tracking
numbers, but minimizes the number of production cells.
Back to top
Resolution
Resolution is a statistical term summarizing
which effects are confounded. In Resolution III designs, some
2-way interactions are confounded with main effects (like in a
standard Plackett-Burman design). This means that significant
interactions—if you cannot find them—will add some error to the
main effects. Often it’s no big deal, but if there is a large
interaction, or a few significant interactions, then the error
can lead to some wrong decisions.
If 2-way interactions are expected to be
significant, then you should use a design of Resolution IV or
higher. This means some 2-way interactions may be confounded
with other 2-way interactions, but main effects are only
confounded with 3-way (or higher-order) interactions.
Considering potential interactions is an
important step before creating the test design. If interactions
are expected to be large, then you should either:
(a) Redefine, combine, or remove test
elements to eliminate the potential interaction(s), or
(b) Create a test design that separates the potential
interactions from other important effects
For example, if you want to test 7 elements of
your newspaper ad, you can use an 8-recipe resolution III
fractional-factorial design (sounds impressive doesn’t it?), or
you can use a 16-recipe resolution IV fractional-factorial
design that places all 2-way interactions into the additional
columns in the test matrix, away from the seven main effects. In
this case, the statistical benefits of a 16-recipe test would
probably outweigh the additional effort required to create eight
more ads.
Confounding Scheme
The confounding scheme shows exactly which
interactions are mixed together with which main effects. Though
your marketing team can ignore the statistical details, your
test expert—whether an outside consultant or in-house
statistician—needs to understand the confounding scheme of every
test you run. No one should just assume that all interactions
will be zero (ignoring them doesn’t make them go away).
Your test expert should look over the complete
confounding scheme and consider “what-if” scenarios:
- Looking at all the test elements, what
interactions could be important?
- How could potential interactions affect
our results?
- Should I change the elements or test
design to guard against possible confounding error?
Confounding is the price you pay for more
variables in fewer test cells. It adds some uncertainty, but can
be managed effectively with careful up-front planning and a good
understanding of the marketing-mix elements and the technical
issues of test design.
Reflection
Reflection, also called full foldover, is a
simple way to increase resolution and reduce confounding. No
need to explain the details here, except to say that reflection
doubles the number of test recipes to increase resolution.
- In programs with a high per-recipe
cost—like catalog and direct mail—reflection is good to
consider, but may add too much additional cost
- Most every retail test should use
reflection, or another method for increasing resolution.
Generally, retail tests require the same effort no matter
how many test recipes there are, so a higher-resolution
design is almost always better.
- Internet tests can often benefit from
reflection (see the E-mail case study)
Replication and Randomization
Two other requirements for every test are:
- Replication = multiple measurements of
each test recipe
- Randomization = randomly assigning “test
units” (people, stores, phone reps, etc.) to prevent unknown
differences from affecting results
Replication is no problem for most direct
marketing tests. You divide the list among test recipes and
thousands of people receive each version. For retail tests where
the “test unit” is a store rather than a person, replication
requires that you test every recipe in at least two stores. So
if you have 20 recipes, you need at least 40 stores and often
have many more to achieve the required sample size. With a few
stores in each recipe, you can analyze store-to-store
differences and eliminate stores that are outliers because of
outside causes (for example, low sales in Chicago due to a
snowstorm, higher sales in a new store with growing awareness,
or bad data from a store where the test recipe was executed
incorrectly).
Randomization is also easier for Internet,
e-mail, direct mail, and catalog tests—just randomly assign
names to each recipe or run each webpage at random when someone
clicks on a banner.
In advertising and retail tests, randomization
requires more careful attention. The key is to avoid linking a
test element with some variable outside of the test. For
example, do not give all the largest stores the A+ recipes and
all the smallest stores the A- recipes (unless “store size” is
the test element). If you do, store size may influence the
effect of A.
Overall:
- Test elements should be executed
consistently (every + and – done the same way every time)
- All non-test variables should be kept
constant
- Randomize all test units and the
assignment of test units to recipes in order to minimize
unknown or uncontrollable sources of error
A note about “Taguchi Methods”
Just over the last few years, Taguchi
Methods (named after the famous Japanese manufacturing expert, Genichi Taguchi) have become a popular term for certain
scientific testing techniques. These methods are not separated
out in our discussions because:
- Some Taguchi designs are exactly the same
as standard fractional-factorial designs (so whatever you
call them is OK).
- Taguchi does diverge from standard
practice is his selection process and for some multi-level
test designs. For these, marketers must be very careful
since Taguchi can use more complex confounding schemes and
often assumes all interactions can be ignored.
- The term “Taguchi methods” commonly
covers a number of statistical concepts and techniques for
manufacturing applications unrelated to testing.
- Overall, numerous statisticians have
advanced the field of scientific testing throughout the
twentieth century (and beyond) and each provides a valuable
addition to the vast realm of testing theory and techniques.
Taguchi is but one of many.
Next, you can learn more about the details of
managing a test or look over
some interesting real-world examples in the
case studies
& articles.
Back to top
|