Although basic marketing testing is simple, it does not allow for multiple variables to be tested; comparisons and interactions are difficult to see and testing is often slow and inefficient. Conversely, with scientific marketing testing, variables are combined into a single statistical design. Multivariable testing creates a unified, fast and accurate strategy for effective testing and results that help you meet your business goals.
Multivariable testing increases your speed, power, accuracy, and insights by combining variables into one statistical test design. With multivariable testing, or matrix tests, it is possible to change many variables at once while isolating the impact of each.
To learn the scientific steps behind the multivariable approach, keep reading. If not, please refer to case studies and articles to see the real world application and results of multivariable testing.
Say you want to test two elements of your e-mail:
(A) Price: the current $24.99 price versus $29.99
(B) Offer: the current 10% discount versus a free gift
You can test these two elements easily enough as two separate split-run tests with three different emails:
- Control ($24.99 + 10% off)
- Higher price ($29.99 + 10% off)
- New offer ($24.99 + free gift)
If conversion rates average about 2% and there is a total list of 400,000 names, 200,000 people might be randomly selected to receive the control e-mail and 100,000 to receive each of the test cells. As two separate split-run tests, it is possible to calculate the impact of each variable on its own, the “main effect” of (A) Price and (B) Offer. With these sample sizes, there is a 50-50 chance of seeing a statistically significant effect if either variable increases conversion rate by 6.1% or more (from a 2% to 2.12%; see sample size for an explanation of the equation).
For a more efficient and powerful test, both variables can be combined in the same scientific test design with just one more test cell, where both A and B are at the test level:
4. Higher price and new offer ($29.99 + free gift)
Statistically, this additional test cell instantly gives a much more information at a lower cost. Now there is a balanced test design where every test cell provides an additional piece of data about every variable.
Let’s rewrite the new test design as a matrix of the two variables, or “test elements,” and the four test cells, or “recipes,” using a “–” (or –1) to represent the control and a “+” (or +1) for the new level, as shown below:
The first three recipes are just like the control and two splits. The 4th recipe provides unique information on the combination of both new levels together. In addition, now two recipes, 2 and 4, give information about the higher price (A+) and both recipes 3 and 4 give information about the free gift offer (B+), so all four recipes can be used to compare both the price and offer change.
In a scientific test, every recipe provides additional information about every test element, since half the recipes have each element at the test level and half have each element at the control level (but a different half and half for each variable).
The perfect balance of the test matrix lets you separate out the main effect of each test element. For the two A+ recipes (2 and 4), you have one B+ and one B-. Statistically, when you average recipes 2 and 4, the effect of B “averages out,” so you end up with A+, the high price, independent of the impact of changing the offer. The same is true when averaging both A- recipes (1 and 3).
Similar to a split-run test, the effect of A is simply the test average versus the control average, or, in this case, the average of recipes 2 and 4 versus the average of recipes 1 and 3. The effect of B is also calculated as the average of all + recipes (3, 4) minus the average of all – recipes (1, 2). This means that, statistically, …
You can use the same sample size regardless of the number of elements and recipes in the test.
Whatever sample size is needed can be split evenly among all recipes. Whether there are 2 or 22 elements, 4 or 24 recipes, half of all recipes are “+” and half are “–” so all recipes are used to calculate every effect.
Scientific tests also provide information about interactions between test elements.
Unlike split-run tests, recipe #4 shows the impact of changing both elements together. You can compare this combined effect with each main effect to see if the impact of each element changes depending on how the other elements are set, creating an interaction, like:
- The interaction between short-term and long-term interest rates in a credit card test showed the marketing team how they could increase profitability without hurting response.
- A direct mail test showed that certain variables had a much greater impact if they could be seen through the envelope window.
- An Internet test showed how two graphics had a greater impact if both were moved to new locations; moving just one or the other had little effect.
Analyzing interactions between test elements
Let’s expand this example to show the interaction and conversion data:
Each row—recipes 1 through 4—shows the four versions of the e-mail that were sent to 100,000 randomly selected names. Each is tagged so every purchase can be tracked to the correct version of the e-mail.
The AB column represents the interaction: the incremental change in conversion rate when both elements are changed together. In other words, how much each main effect may change, depending on how the other element is set. The AB column is used only for the calculation of the interaction effect. Each plus and minus in this column is calculated by multiplying the signs in columns A and B. For example, (-1) x (-1) = (+1) in recipe #1 and (+1) x (-1) = (-1) for recipe #2.
This shows the main effect of Price (A) and Offer (B) and the interaction effect from changing both elements together. All effects are calculated as the average of all “+” levels minus the average of all “–” levels. The negative main effect of A means that the control price of $24.99 results in a 0.7 percentage point average increase in conversion rate over the higher price (+ level). With a positive main effect of B, the new offer (free gift) has a 0.9 percentage point higher conversion rate than the 10% discount offer.
These main effects are comparative. The change in offer has a greater impact than the change in price (0.9% versus 0.7%).
The interaction effect is more difficult to interpret. The AB interaction effect of +0.5% means that conversion rate can be 0.5 percentage points higher or lower, depending on how A and B are set. A simple graph can be helpful.
In this chart, the top line shows both recipes with the free gift (B+) and the bottom line is recipes with the discount (B-). The left side shows both recipes with the lower price (A-) and the right points are recipes at the higher price (A+). An interaction is present when the lines are not parallel.
Looking at this plot, the lower price and free gift pulls best (upper left), but the higher price leads to a negligible drop in response when the free gift is offered (upper right). Therefore, the team should ask, “Is the $29.99 price more profitable if we offer a free gift?”
Calculating the optimal combination of elements
In this case, where all combinations of the elements are tested, the optimal combination is simply the recipe with the largest conversion rate. However, most scientific tests do not test all combinations, so the optimal combination is often much better than the best of all the test recipes!
The optimal combination of elements can also be calculated with the equation:
This equation means that the predicted average conversion rate = the overall average for the test + one-half of each significant effect x the selected level for each
So the equation becomes:
y = 3.45% + ½ [(-0.7%)(-1)+(+0.9%)(+1)+(+0.5%)(-1)(+1)] = 4.0%
This basically says that we choose A- (the low $24.99 price) and B+ (the free gift), but the interaction detracts from the sum of the main effects. With A- and B+ times a positive AB interaction, the overall result is –0.5% less than just adding both main effects together. Like the graph shows:
- With the free gift (top line), the impact of price is not too large, or…
- With the lower price (points on left), the free gift does not increase conversion rate as much
Even this simple test shows the advantages of a “designed” multivariable scientific test. Compared to common split-run techniques, this two-variable test required one more test cell (recipe 4), but results were more powerful, in-depth, and accurate:
Mutlivariable test results are more powerful, in-depth, and accurate than split-run techniques:
#1: Sample size is reduced by 50%
- With both variables in the same statistical test design, all recipes are used to calculate all effects, so the per-recipe sample size can be very small
- Effects in a split-run test would have to be about 50% larger to be statistically significant
#2: Interactions show how the effect of variables can change
- It’s impossible to analyze interactions with split-run tests
- The interaction clearly shows the relationship between variables—valuable information on the fluid impact of marketing-mix elements
- Interactions often present new opportunities, like in this case, where the price increase may be more profitable with the free gift offer
#3: Main effects and interactions provide more accurate real-world data
- Split-run tests of price and offer (using just the first three recipes here) would have shown that the higher price reduced conversion 1.2% (recipe 2 versus 1) and the free gift increased conversion just 0.4% (recipe 3 versus 1). The split-run conclusions are nearly opposite what we see when the interaction is analyzed as well: the free gift is much more beneficial and the higher price may have minimal impact if the free gift is offered.
- Main effects are more robust, averaged across changes in the other elements (often split-run results are contingent upon variables not tested).
- Main effects are comparative, so you can see which changes are most important and how all the effects compare.
- Interaction effects add greater accuracy to the calculation of expected results.
With such significant benefits with a simple two-variable test, just imagine what you can gain from testing 5, 10, or 20 variables all at once in a scientific test. For example, a scientific test of 20 elements requires just 8% of the sample size required for split-run tests!
Most every scientific test is based on the same principles as the two-element test, above. One big difference is that larger tests use only a small fraction of all possible combinations of test elements. For example, there are 32,768 possible combinations of 15 elements at two levels each. You can test all possible combinations if you would like, or test all 15 with as few as 16 recipes.
The two-variable test above is called a “full-factorial” design; it tests all combinations of the selected variables. The drawback of this approach is that every additional variable doubles the number of recipes required for the test. Fortunately, advanced statistical techniques offer other test designs with the same advantages, yet far fewer recipes than full-factorial tests.