Home | About us | LucidViewTM Strategy | Scientific testing | Case studies & articles | Services | FAQs | Contact us

 



Variety of test designs and techniques
 

Scientific testing

Introduction

How to test many variables at once

Creating large test designs

Variety of test designs and techniques

Managing a scientific test

Sample size

A wide array of designs lets you create the best test for your objectives

Whether testing a new merchandise mix, many different price points and offers, dozens of creative changes, or numerous marketing-mix elements as quickly as possible, scientific testing and the LucidView strategy offer a range of different test designs. Each has unique requirements and advantages, so it’s important to get some guidance. The major classes of test designs are summarized below.
 

In selecting the most appropriate test design, you should:

1. Let your marketing objective drive the decision – find a design to give you the answers you want.

2. Create a testing plan – a screening-refining test cycle over a series of campaigns throughout the year.

3. Simple is better – use a powerful scientific design, but avoid unnecessary complexity.

4. Increase your focus over time – understand the big picture before digging into the details (cast a wide net for new ideas and find out what’s important before wasting resources splitting hairs).

Academic tomes cover various test designs in detail, but here’s a basic overview of most of your options:

Generally, the strategy for marketing and advertising testing usually entails:

1. Test many elements in a large "screening" design to quickly identify the important variables to change or test further

  • Fractional-factorial (if up to 15 test elements) – see the E-mail case study for an example
  • Plackett-Burman design (if 16 or more test elements) – see the Direct mail subscriptions screening test

2. Test a few important variables from screening in a smaller "refining" test

  • Full-factorial test (of 2-5 elements) – as in the DM Subscriptions refining test or first example in Beyond the A/B Split
  • Fractional-factorial test (of about 5-7 elements)

3. Once you know which variables have the greatest impact on sales, run smaller tests of 2-5 elements while continuing with larger tests. This lets you continually optimize important elements, identify market shifts, and never stop searching for new opportunities.

For example, you can run a separate price/offer test of two or more levels for each element, using a:

  • Fractional-factorial test design
  • Full-factorial design – possibly with advanced techniques for multiple levels
    –  Centerpoints
    –  Box-Behnken design
    –  Central Composite design

4. Use split-run test to “try out” a new product, vehicle, program, or marketing channel. Then, if the new version gives OK results, then run a scientific test to rapidly optimize the program (starting with #1, above).
 

And now for more details…

At first glance, all these test designs may seem fairly simple. However, each has numerous technical issues and considerations that impact your decision. For example, an expert should understand:

  • The pros and cons of testing 15 elements in a 16-recipe fractional-factorial versus a 20-recipe Plackett-Burman design
  • Where multiple levels may be better than two-level designs
  • Why a 24-recipe retail test may be better than a 12-recipe test
  • When an e-mail and landing page should be tested together or separately

Plus understand how the marketing channel impacts test design; like differences between direct mail and retail channels, like:

  • Selection of test elements (broader scope for direct mail, yet narrow scope for retail)
  • Minimum acceptable resolution (III for direct mail, but IV for retail)
  • Sample size (why replication and stability are an issue for retail but not direct mail)
  • Cost of testing (direct mail production versus retail test execution)

Additional techniques and issues are important beyond the choice of a type of test design, including: alternatives for testing multiple levels and segmentation variables, the test design’s resolution and confounding scheme, and ways to increase statistical power without increasing the cost of testing.

Back to top

Testing multiple levels

The most efficient way to test many variables at once is with two levels for each, usually the control level and a bold change that someone thinks will increase sales or profitability. But 3 or more levels are fine to test, if the value of multiple levels is worth the cost of running additional test recipes. Testing 3 elements each at 3 levels requires 27 recipes in a full-factorial design (33), versus 8 recipes for testing 3 elements at 2 levels each (23).

But a few novel techniques are available—in addition to using centerpoints, Box-Behnken, or central composite designs—that let you test multiple levels with few additional recipes:

  • For one 3-level test element, you can use a standard 2-level design plus one additional one-variable recipe (using more recipes for additional levels and analyzing this variable as a series of split-run tests)
  • For one 4-level test element, you can use three columns in the fractional-factorial test design
  • To test 2-4 elements at 3+ levels, create a separate multi-level test design apart from all other 2-level elements

Testing customer/segmentation variables along with creative elements

Testing is not a substitute for customer modeling and segmentation analysis, but a number of customer segments or variables can be tested at once. A few alternatives for testing marketing programs across different segments include:

1. Run the full test design within each customer group.

If you have a few customer segments and 19 elements to test, you can run the same 20-recipe test design within each segment. This allows you to analyze each segment alone, plus all together for a larger overall sample size.

2. Add “customer segment” into the design as a separate test element.

If your available sample size is small, this approach lets you use your full customer database for one test, but is best if you expect all segments to respond similarly to each element in the test (see the E-mail case study for an example of a 4-level segmentation variable).

3. Run a screening test in 1-2 segments, followed by a refining test in others.

If you can select a couple of representative customer segments or marketing programs for a large screening test, then you can test the significant elements in a smaller test design across more segments.

4. Create a “double” test design with segment variables tested on top of a standard design of creative elements.

Called a “segmentation-crossed” design, this approach lets you test a few customer characteristics—in a full or fractional-factorial design—at the same time as you test marketing program elements, where each program recipe is tested across the full segmentation test design. This requires numerous key-codes or tracking numbers, but minimizes the number of production cells.

Back to top

Resolution

Resolution is a statistical term summarizing which effects are confounded. In Resolution III designs, some 2-way interactions are confounded with main effects (like in a standard Plackett-Burman design). This means that significant interactions—if you cannot find them—will add some error to the main effects. Often it’s no big deal, but if there is a large interaction, or a few significant interactions, then the error can lead to some wrong decisions.

If 2-way interactions are expected to be significant, then you should use a design of Resolution IV or higher. This means some 2-way interactions may be confounded with other 2-way interactions, but main effects are only confounded with 3-way (or higher-order) interactions.

Considering potential interactions is an important step before creating the test design. If interactions are expected to be large, then you should either:

(a) Redefine, combine, or remove test elements to eliminate the potential interaction(s), or
(b) Create a test design that separates the potential interactions from other important effects

For example, if you want to test 7 elements of your newspaper ad, you can use an 8-recipe resolution III fractional-factorial design (sounds impressive doesn’t it?), or you can use a 16-recipe resolution IV fractional-factorial design that places all 2-way interactions into the additional columns in the test matrix, away from the seven main effects. In this case, the statistical benefits of a 16-recipe test would probably outweigh the additional effort required to create eight more ads.

Confounding Scheme

The confounding scheme shows exactly which interactions are mixed together with which main effects. Though your marketing team can ignore the statistical details, your test expert—whether an outside consultant or in-house statistician—needs to understand the confounding scheme of every test you run. No one should just assume that all interactions will be zero (ignoring them doesn’t make them go away).

Your test expert should look over the complete confounding scheme and consider “what-if” scenarios:

  • Looking at all the test elements, what interactions could be important?
  • How could potential interactions affect our results?
  • Should I change the elements or test design to guard against possible confounding error?

Confounding is the price you pay for more variables in fewer test cells. It adds some uncertainty, but can be managed effectively with careful up-front planning and a good understanding of the marketing-mix elements and the technical issues of test design.

Reflection

Reflection, also called full foldover, is a simple way to increase resolution and reduce confounding. No need to explain the details here, except to say that reflection doubles the number of test recipes to increase resolution.

  • In programs with a high per-recipe cost—like catalog and direct mail—reflection is good to consider, but may add too much additional cost
  • Most every retail test should use reflection, or another method for increasing resolution. Generally, retail tests require the same effort no matter how many test recipes there are, so a higher-resolution design is almost always better.
  • Internet tests can often benefit from reflection (see the E-mail case study)

Replication and Randomization

Two other requirements for every test are:

  • Replication = multiple measurements of each test recipe
  • Randomization = randomly assigning “test units” (people, stores, phone reps, etc.) to prevent unknown differences from affecting results

Replication is no problem for most direct marketing tests. You divide the list among test recipes and thousands of people receive each version. For retail tests where the “test unit” is a store rather than a person, replication requires that you test every recipe in at least two stores. So if you have 20 recipes, you need at least 40 stores and often have many more to achieve the required sample size. With a few stores in each recipe, you can analyze store-to-store differences and eliminate stores that are outliers because of outside causes (for example, low sales in Chicago due to a snowstorm, higher sales in a new store with growing awareness, or bad data from a store where the test recipe was executed incorrectly).

Randomization is also easier for Internet, e-mail, direct mail, and catalog tests—just randomly assign names to each recipe or run each webpage at random when someone clicks on a banner.

In advertising and retail tests, randomization requires more careful attention. The key is to avoid linking a test element with some variable outside of the test. For example, do not give all the largest stores the A+ recipes and all the smallest stores the A- recipes (unless “store size” is the test element). If you do, store size may influence the effect of A.

Overall:

  • Test elements should be executed consistently (every + and – done the same way every time)
  • All non-test variables should be kept constant
  • Randomize all test units and the assignment of test units to recipes in order to minimize unknown or uncontrollable sources of error

A note about “Taguchi Methods”

Just over the last few years, Taguchi Methods (named after the famous Japanese manufacturing expert, Genichi Taguchi) have become a popular term for certain scientific testing techniques. These methods are not separated out in our discussions because:

  • Some Taguchi designs are exactly the same as standard fractional-factorial designs (so whatever you call them is OK).
  • Taguchi does diverge from standard practice is his selection process and for some multi-level test designs. For these, marketers must be very careful since Taguchi can use more complex confounding schemes and often assumes all interactions can be ignored.
  • The term “Taguchi methods” commonly covers a number of statistical concepts and techniques for manufacturing applications unrelated to testing.
  • Overall, numerous statisticians have advanced the field of scientific testing throughout the twentieth century (and beyond) and each provides a valuable addition to the vast realm of testing theory and techniques. Taguchi is but one of many.

Next, you can learn more about the details of managing a test or look over some interesting real-world examples in the case studies & articles.

Back to top

 

© LucidView 2007. All rights reserved. Contact: 888-LucidView (888-582-4384), info@lucidview.com