Home | About us | LucidViewTM Strategy | Scientific testing | Case studies & articles | Services | FAQs | Contact us

 



Testing many variables at once - how it works
 

Scientific testing

Introduction

How to test many variables at once

Creating large test designs

Variety of test designs and techniques

Managing a scientific test

Sample size

Basic marketing testing can seem so simple...

There is no “test design” when each variable is simply another test cell run against the control. But this simple approach has some major drawbacks:

  • Few variables can be tested

  • Sample size can be enormous for a few statistically-valid tests

  • Results are difficult to compare and interactions are impossible to see

  • Testing is slow and inefficient

Scientific testing increases your speed, power, accuracy, and insights by combining variables into one statistical test design. Where common techniques result in a random collection of unrelated tests, LucidView’s scientific tests create a unified strategy for more effective testing.

What is a “designed” test?

Scientific test designs, also called matrix tests, let you change many variables at once and still isolate the impact of each. With a structured test design, you still have a number of test cells, or “recipes,” but each recipe contains a unique combination of all your test elements. Instead of one test cell providing one data point on one variable, each test cell provides unique information on every variable in the test. When the recipes are combined together, you get a greater depth and variety of data about the impact of each variable on its own and in combination with others. Plus, since each recipe is part of the larger test design, tests are far more efficient, so sample size for each recipe can be very small.

How can you test many variables at once?

The “scientific method” was drilled into us since elementary school… For a valid test you hold everything constant and change just one variable at a time.

The key to the scientific method is that you must be able to isolate the effect of the variables under study in order to prove a cause-and-effect relationship. If a lot of things change together or inconsistently, then the effect of each variable cannot be isolated. Back-end statistical analyses can show a correlation between variables, but cause-and-effect proof comes only from a careful test design.

However, changing one variable at a time is not the only way to test…

With the right techniques, you can change many variables at once and still isolate the impact of each.

How? Let’s take it step-by-step…

Say you want to test two elements of your e-mail:
    (A) Price: the current $24.99 price versus $29.99
    (B) Offer: the current 10% discount versus a free gift

You can test these two elements easily enough as two separate split-run tests with three different e-mails:
    1.  Control ($24.99 + 10% off)
    2.  Higher price ($29.99 + 10% off)
    3.  New offer ($24.99 + free gift)

If your conversion rate averages about 2% and you have a total list of 400,000 names, you might randomly select 200,000 people to receive the control e-mail and 100,000 to receive each of the test cells. As two separate split-run tests, you can calculate the impact of each variable on its own, the “main effect” of (A) Price and (B) Offer. With these sample sizes, you have a 50-50 chance of seeing a statistically-significant effect if either variable increases conversion rate by 6.1% or more (from a 2% to 2.12%; see sample size for an explanation of the equation).

For a more efficient and powerful test, you can combine both variables in the same scientific test design with just one more test cell, where both A and B are at the test level:

    4. Higher price and new offer ($29.99 + free gift)

Statistically, this one additional test cell instantly gives you much more information at a lower cost. Now you have a balanced test design where every test cell gives you an additional piece of data about every variable.

Let’s rewrite the new test design as a matrix of the two variables, or “test elements,” and the four test cells, or “recipes,” using a “–” (or –1) to represent the control and a “+” (or +1) for the new level, as shown below:

The first three recipes are just like the control and two splits. The 4th recipe provides unique information on the combination of both new levels together. In addition, now two recipes, 2 and 4, give information about the higher price (A+) and both recipes 3 and 4 give information about the free gift offer (B+), so all four recipes can be used to compare both the price and offer change.

Back to top

In a scientific test, every recipe provides additional information about every test element, since half the recipes have each element at the test level and half have each element at the control level (but a different half and half for each variable).

The perfect balance of the test matrix lets you separate out the main effect of each test element. For the two A+ recipes (2 and 4), you have one B+ and one B-. Statistically, when you average recipes 2 and 4, the effect of B “averages out,” so you end up with A+, the high price, independent of the impact of changing the offer. The same is true when averaging both A- recipes (1 and 3).

Similar to a split-run test, the effect of A is simply the test average versus the control average, or, in this case, the average of recipes 2 and 4 versus the average of recipes 1 and 3. The effect of B is also calculated as the average of all + recipes (3, 4) minus the average of all – recipes (1, 2). This means that, statistically, …

You can use the same sample size no matter how many elements and recipes are in the test.

Whatever sample size you need can be split evenly among all recipes. Whether you have 2 or 22 elements, 4 or 24 recipes, half of all recipes are “+” and half are “–” so all recipes are used to calculate every effect.

Scientific tests also provide information about interactions between test elements.

Unlike split-run tests, recipe #4 shows the impact of changing both elements together. You can compare this combined effect with each main effect to see if the impact of each element changes depending on how the other elements are set, creating an interaction, like:

  • The interaction between short-term and long-term interest rates in a credit card test showed the marketing team how they could increase profitability without hurting response.
     
  •  A direct mail test showed that certain variables had a much greater impact if they could be seen through the envelope window.
     
  • An Internet test showed how two graphics had a greater impact if both were moved to new locations; moving just one or the other had little effect.

 

Analyzing interactions between test elements

Let’s expand this example to show the interaction and conversion data:

Each row—recipes 1 through 4—shows the four versions of the e-mail that were sent to 100,000 randomly selected names. Each is tagged so every purchase can be tracked to the correct version of the e-mail.

The AB column represents the interaction: the incremental change in conversion rate when both elements are changed together. In other words, how much each main effect may change, depending on how the other element is set. The AB column is used only for the calculation of the interaction effect. Each plus and minus in this column is calculated by multiplying the signs in columns A and B. For example, (-1)x(-1)=(+1) in recipe #1 and (+1)x(-1)=(-1) for recipe #2.

The effects shows the main effect of Price (A) and Offer (B) and the interaction effect from changing both elements together. All effects are calculated as the average of all “+” levels minus the average of all “–” levels. The negative main effect of A means that the control price of $24.99 results in a 0.7 percentage point average increase in conversion rate over the higher price (+ level). With a positive main effect of B, the new offer (free gift) has a 0.9 percentage point higher conversion rate than the 10% discount offer.

These main effects are comparative. The change in offer has a greater impact than the change in price (0.9% versus 0.7%).

The interaction effect is more difficult to interpret. The AB interaction effect of +0.5% means that conversion rate can be 0.5 percentage points higher or lower, depending on how A and B are set. A simple graph can be helpful.

In this chart, the top line shows both recipes with the free gift (B+) and the bottom line is recipes with the discount (B-). The left side shows both recipes with the lower price (A-) and the right points are recipes at the higher price (A+). An interaction is present when the lines are not parallel.

Looking at this plot, the lower price and free gift pulls best (upper left), but the higher price leads to a negligible drop in response when the free gift is offered (upper right). Therefore, the team should ask, “Is the $29.99 price more profitable if we offer a free gift?”

Back to top

Calculating the optimal combination of elements

In this case, where all combinations of the elements are tested, the optimal combination is simply the recipe with the largest conversion rate. However, most scientific tests do not test all combinations, so the optimal combination is often much better than the best of all the test recipes!

The optimal combination of elements can also be calculated with the equation:

This equation means that the predicted average conversion rate = the overall average for the test + one-half of each significant effect x the selected level for each

So the equation becomes:

y = 3.45% + ½ [(-0.7%)(-1)+(+0.9%)(+1)+(+0.5%)(-1)(+1)] = 4.0%

This basically says that we choose A- (the low $24.99 price) and B+ (the free gift), but the interaction detracts from the sum of the main effects. With A- and B+ times a positive AB interaction, the overall result is –0.5% less than just adding both main effects together. Like the graph shows:

  • With the free gift (top line), the impact of price is not too large, or…
  • With the lower price (points on left), the free gift does not increase conversion rate as much

Even this simple test shows the advantages of a “designed” multivariable scientific test. Compared to common split-run techniques, this two-variable test required one more test cell (recipe 4), but results were more powerful, in-depth, and accurate:

#1: Sample size is reduced by 50%

  • With both variables in the same statistical test design, all recipes are used to calculate all effects, so the per-recipe sample size can be very small
  • Effects in a split-run test would have to be about 50% larger to be statistically significant

#2: Interactions show how the effect of variables can change

  • It’s impossible to analyze interactions with split-run tests
  • The interaction clearly shows the relationship between variables—valuable information on the fluid impact of marketing-mix elements
  • Interactions often present new opportunities, like in this case, where the price increase may be more profitable with the free gift offer

#3: Main effects and interactions provide more accurate real-world data

  • Split-run tests of price and offer (using just the first three recipes here) would have shown that the higher price reduced conversion 1.2% (recipe 2 versus 1) and the free gift increased conversion just 0.4% (recipe 3 versus 1). The split-run conclusions are nearly opposite what we see when the interaction is analyzed as well: the free gift is much more beneficial and the higher price may have minimal impact if the free gift is offered.
  • Main effects are more robust, averaged across changes in the other elements (often split-run results are contingent upon variables not tested).
  • Main effects are comparative, so you can see which changes are most important and how all the effects compare.
  • Interaction effects add greater accuracy to the calculation of expected results.

With such significant benefits with a simple two-variable test, just imagine what you can gain from testing 5, 10, or 20 variables all at once in a scientific test. For example, a scientific test of 20 elements requires just 8% of the sample size required for split-run tests!

Most every scientific test is based on the same principles as the two-element test, above. One big difference is that larger tests use only a small fraction of all possible combinations of test elements. For example, there are 32,768 possible combinations of 15 elements at two levels each. You can test all possible combinations if you would like, or you can test all 15 with as few as 16 recipes.

The two-variable test, above, is called a “full-factorial” design, testing all combinations of the selected variables. The drawback of this approach is that every additional variable doubles the number of recipes required for the test. Fortunately, advanced statistical techniques offer other test designs with the same advantages, yet far fewer recipes than full-factorial tests.
 

If you want to learn more about the statistics behind large test designs, read over creating large test designs. You can also look at the variety of test designs and techniques to learn more about the wide array of options for scientific testing. If you don’t care for all the theory and statistics, read over some case studies & articles to see how marketers have successfully used scientific testing in fast-moving, real-world marketing programs.

Back to top

 

© LucidView 2007. All rights reserved. Contact: 888-LucidView (888-582-4384), info@lucidview.com