Utilizing a variety of test designs allows for the creation of a customized test that meets your objectives.

Whether testing a new merchandise mix, price points and offers, creative changes, or numerous marketing-mix elements, scientific testing and the LucidView strategy offer a range of different test designs. Since each design has unique requirements and advantages, it is important to get direction and guidance. The major classes of test designs are:

When selecting the best design, it is important to:

  1. Let your marketing objective drive the decision – find a test design that provides the answers you are seeking.
  2. Create a testing plan – a screening-refining test cycle over a series of campaigns throughout the year.
  3. Choose the simplest test design that meets your needs – use a powerful scientific design, but avoid unnecessary complexity.
  4. Increase the focus over time – understand the big picture before delving into the details


The basic options include:

Type of Test Design Benefits Constraints Applications
Split-run
(one variable at a time)
- Simple for one variable

- Minimal information
- No analysis of interactions
- Large sample size required for many variables

- Totally new creative, program, or channel
(see if it works OK before testing more extensively)
Full-factorial
(all combinations)
- See all interactions clearly along with all main effects - Many test recipes
(may be costly)
- Testing a few important elements with large interactions expected
- Good "refining" test design
Fractional-factorial
(some combinations)
- Many test elements in few recipes
- Quantify all main effects and some interactions
- Medium efficiency
(recipes/elements)
- Need careful planning to control interactions
- Good "screening" design for many test elements (especially if all recipes are not costly)
Plackett-Bunman
(few combinations)
- Many test elements in the fewest recipes
- Quantify all main effects and few interactions
- Very powerful and efficient
- Need careful planning to control interactions
- Good "screening" design for many test elements (especially if all recipes are costly)
Specialized Designs
Centerpoints
(between + and - levels)
- Add one recipe to check for "curvature" in effects - Need midpoints for all test elements
- If curvature exists, need additional testing
- Price/offer testing
(where elements are numbers)
Box-Behnken
(some combos of 3 levels)
- Accurate analysis of curvature with 3 levels
- Fewer recipe than full-factorial designs
- More recipes than 2-level or centerpoint designs - Price/offer testing (etc) with 3-level test elements
Central Composite
(full-factorial with centerpoints + additional avail test recipes)
- Greater accuracy of curvature analysis
- More levels of each element are tested
- Many recipes for few elements - Sequential testing - if find curvature, then run the "avail" recipes
Latin Square and Graeco-Latin Square
(seldom used in marketing)
- Multiple levels for each element with few recipes - Must not have interactions - 1-2 variable test with other "blocking" variables
(e.g. different lists, regions, retail chains, mail drops, etc.)
Optimal designs
(computer generated)
(missed some recipes)
- Flexible designs to avoid certain combinations - Very complex (often unecessary) - Cannot run some combinations
Other techniques (See Below)

Generally, the strategy for marketing and advertising testing entails:

  1. Testing many elements in a large "screening" design to quickly identify the important variables to change or test further
    • Fractional-factorial (if up to 15 test elements) – see the E-mail case study for an example
    • Plackett-Burman design (if 16 or more test elements) – see the Direct mail subscriptions screening test
  2. Testing a few important variables from screening in a smaller "refining" test
    • Full-factorial test (of 2-5 elements) – as in the DM Subscriptions refining test or first example in Beyond the A/B Split
    • Fractional-factorial test (of about 5-7 elements)
  3. Identifying the variables that have the greatest impact on sales, and running smaller tests of 2-5 elements while continuing with larger tests. This process allows for continued optimization of important elements, identification of market shifts, and a continuous search for new opportunities.
  4. Using a split-run test to experiment with a new product, vehicle, program, or marketing channel. If the new version produces acceptable results, running a scientific test can rapidly optimize the program (starting with #1, above).


If you would like to learn more about test designs and techniques, please see below. If you prefer to review the real world application of test designs, please see our case studies. 

At first glance, these test designs may seem fairly simple. However, each has numerous technical issues and considerations that impact your decision. For example, an expert should understand:

  • The pros and cons of testing 15 elements in a 16-recipe fractional-factorial versus a 20-recipe Plackett-Burman design
  • Where multiple levels may be better than two-level designs
  • Why a 24-recipe retail test may be better than a 12-recipe test
  • When an e-mail and landing page should be tested together or separately


Plus the marketing channel impacts test design; differences between direct mail and retail channels, for instance:

  • Selection of test elements (broader scope for direct mail, yet narrow scope for retail)
  • Minimum acceptable resolution (III for direct mail, but IV for retail)
  • Sample size (why replication and stability are an issue for retail but not direct mail)
  • Cost of testing (direct mail production versus retail test execution)


Additional techniques and issues beyond the choice of a type of test design, are important including: alternatives for testing multiple levels and segmentation variables, the test design’s resolution and confounding scheme, and ways to increase statistical power without increasing the cost of testing.

Testing multiple levels

The most efficient way to test many variables at once is with two levels for each, usually the control level and a bold change that someone thinks will increase sales or profitability. But three or more levels are fine to test, if the value of multiple levels is worth the cost of running additional test recipes. Testing three elements each at three levels requires 27 recipes in a full-factorial design (33), versus eight recipes for testing three elements at two levels each (23).

But a few novel techniques are available—in addition to using centerpoints, Box-Behnken, or central composite designs—that let you test multiple levels with few additional recipes:

  • For one 3-level test element, you can use a standard 2-level design plus one additional one-variable recipe (using more recipes for additional levels and analyzing this variable as a series of split-run tests)
  • For one 4-level test element, you can use three columns in the fractional-factorial test design
  • To test 2-4 elements at 3+ levels, create a separate multi-level test design apart from all other 2-level elements


Testing customer/segmentation variables along with creative elements

Testing is not a substitute for customer modeling and segmentation analysis, but a number of customer segments or variables can be tested at once. Alternatives to testing marketing programs across different segments include:

  1. Run the full test design within each customer group.

    If you have a few customer segments and 19 elements to test, you can run the same 20-recipe test design within each segment. This allows you to analyze each segment alone, plus all together for a larger overall sample size.

  2. Add “customer segment” into the design as a separate test element.

    If your available sample size is small, this approach lets you use your full customer database for one test, but is best if you expect all segments to respond similarly to each element in the test (see the E-mail case study for an example of a 4-level segmentation variable).

  3. Run a screening test in 1-2 segments, followed by a refining test in others.

    If you can select a couple of representative customer segments or marketing programs for a large screening test, then you can test the significant elements in a smaller test design across more segments.

  4. Create a “double” test design with segment variables tested on top of a standard design of creative elements.

    Called a “segmentation-crossed” design, this approach lets you test a few customer characteristics—in a full or fractional-factorial design—at the same time as you test marketing program elements, where each program recipe is tested across the full segmentation test design. This requires numerous key-codes or tracking numbers, but minimizes the number of production cells.


Resolution

Resolution is a statistical term summarizing which effects are confounded. In Resolution III designs, some 2-way interactions are confounded with main effects (like in a standard Plackett-Burman design). This means that significant interactions—if you cannot find them—will add some error to the main effects. Often it’s no big deal, but if there is a large interaction, or a few significant interactions, then the error can lead to some wrong decisions.

If 2-way interactions are expected to be significant, then you should use a design of Resolution IV or higher. This means some 2-way interactions may be confounded with other 2-way interactions, but main effects are only confounded with 3-way (or higher-order) interactions.

Considering potential interactions is an important step before creating the test design. If interactions are expected to be large, then you should either:

(a) Redefine, combine, or remove test elements to eliminate the potential interaction(s), or

(b) Create a test design that separates the potential interactions from other important effects

For example, if you want to test seven elements of your newspaper ad, you can use an 8-recipe resolution III fractional-factorial design (sounds impressive doesn’t it?), or you can use a 16-recipe resolution IV fractional-factorial design that places all 2-way interactions into the additional columns in the test matrix, away from the seven main effects. In this case, the statistical benefits of a 16-recipe test would probably outweigh the additional effort required to create eight more ads.

Confounding Scheme

The confounding scheme shows exactly which interactions are mixed together with which main effects. Though your marketing team can ignore the statistical details, your test expert—whether an outside consultant or in-house statistician—needs to understand the confounding scheme of every test you run. No one should just assume that all interactions will be zero (ignoring them doesn’t make them go away).

Your test expert should look over the complete confounding scheme and consider “what-if” scenarios:

Looking at all the test elements, what interactions could be important?
How could potential interactions affect our results?
Should I change the elements or test design to guard against possible confounding error?

Confounding is the price you pay for more variables in fewer test cells. It adds some uncertainty, but can be managed effectively with careful up-front planning and a good understanding of the marketing-mix elements and the technical issues of test design.

Reflection

Reflection, also called full foldover, is a simple way to increase resolution and reduce confounding. No need to explain the details here, except to say that reflection doubles the number of test recipes to increase resolution.

  • In programs with a high per-recipe cost—like catalog and direct mail—reflection is good to consider, but may add too much additional cost
  • Most every retail test should use reflection, or another method for increasing resolution. Generally, retail tests require the same effort no matter how many test recipes there are, so a higher-resolution design is almost always better.
  • Internet tests can often benefit from reflection (see the E-mail case study)


Replication and Randomization

Two other requirements for every test are:

  • Replication = multiple measurements of each test recipe
  • Randomization = randomly assigning “test units” (people, stores, phone reps, etc.) to prevent unknown differences from affecting results


Replication is not a problem for most direct marketing tests. The list is divided among test recipes and thousands of people receive each version. For retail tests where the “test unit” is a store rather than a person, replication requires that every recipe be tested in at least two stores. If there are 20 recipes, least 40 stores are needed and often have many more to achieve the required sample size. With a few stores in each recipe, it is possible to analyze store-to-store differences and eliminate stores that are outliers because of outside causes (for example, low sales in Chicago due to a snowstorm, higher sales in a new store with growing awareness, or bad data from a store where the test recipe was executed incorrectly).

Randomization is also easier for Internet, e-mail, direct mail, and catalog tests—by randomly assigning names to each recipe or run each webpage at random when someone clicks on a banner.=

In advertising and retail tests, randomization requires more careful attention. The key is to avoid linking a test element with some variable outside of the test. For example, do not give all the largest stores the A+ recipes and all the smallest stores the A- recipes (unless “store size” is the test element). If you do, store size may influence the effect of A.

Overall:

  • Test elements should be executed consistently (every + and – done the same way every time)
  • All non-test variables should be kept constant
  • Randomize all test units and the assignment of test units to recipes in order to minimize unknown or uncontrollable sources of error


A note about “Taguchi Methods”

In the last few years, Taguchi Methods (named after the famous Japanese manufacturing expert, Genichi Taguchi) have become a popular term for certain scientific testing techniques. These methods are not specifically listed in our discussions because:

  • Some Taguchi designs are exactly the same as standard fractional-factorial designs
  • Taguchi does diverge from standard practice is his selection process and for some multi-level test designs. For these, marketers must be very careful since Taguchi can use more complex confounding schemes and often assumes all interactions can be ignored.
  • The term “Taguchi methods” commonly covers a number of statistical concepts and techniques for manufacturing applications unrelated to testing.
  • Overall, numerous statisticians have advanced the field of scientific testing throughout the twentieth century (and beyond) and each provides a valuable addition to the vast realm of testing theory and techniques. Taguchi is but one of many.​