If you can’t measure it, you can’t improve it
If you want to increase advertising
effectiveness or improve your marketing programs, you need to
find a way to measure exactly what you’re trying to improve.
This may sound easy enough, but the devil is in the details…
- A leading magazine
publisher found little correlation between retail scanner
data and sales number from wholesalers. While wholesaler
data was used to measure company
performance, scanner data was used to analyze test results.
- An Internet retailer
found nearly opposite results measuring click-through rate
versus conversion rate. In a test of online advertising,
detailed, yet fairly bland ads drove conversion, while vague
messages, exciting graphics, and a nebulous offer drove
clicks. The team joked that a blank box was the best way to
increase click-through rate while ensuring zero new
customers.
- An experienced marketing
team had tested direct mail programs for years, using
“keycode” numbers on each different mail package to track
sales. Unfortunately, the default keycode was the same as
the control. When reply cards were mailed in, company
employees commonly used the default keycode to simplify
their workload. Since test cell responses were
underreported, tests seldom beat the control.
For accurate, reliable data, you need to:
(a) Choose the right metrics
(b) Analyze data stability and reliability
(c) Plan the process for data collection
(d) Refine the scope of the testing project, if
necessary
Choose the right metrics
The best metrics are usually direct measures
of response, sales, and profitability. Sometimes direct-response
metrics are not possible, so substitute metrics are required.
Catalog requests, phone calls for more information, website
visits, and reply cards for “free information” may accurately
measure intent, but usually have less than one-to-one
correlation with actual purchase behavior.
One key benefit of in-market testing is the
ability to measure purchase behavior in the real marketplace
(where customers talk with their wallets and don’t know they’re
being tested). In this, testing remains the only way to prove
which changes directly impact sales. Any departure from a true
measure of sales adds error and may lead to incorrect
conclusions.
Direct marketers take note: If you want
to analyze dollar sales (e.g. average order size) as a key
metric, then you need to calculate the standard deviation of
each test cell along with the average. Without the standard
deviation, σ, of all individual orders in each test cell
(“sigma” is a measure of data variability), you cannot calculate
statistical significance (in addition, you should remove any
outliers), because:
- For response data, all you need is the
number mailed and the number of orders. This is because for
response (yes/no) data, the average and standard deviation
are related.
- For sales data (and other “continuous”
numbers), there is no such relationship, so the standard
deviation must be calculated for every group of data. You
need to collect the individual order sizes somewhere in your
database.
Analyze data stability and reliability
In direct mail—when every test cell is sent to
a random selection of names in the same drop—data stability is
of little concern. But tests running over a period of time or
within different “test units” (e.g. stores, magazines, or
regions) require the analysis of historical data to select
stable and comparable test units.
Stability is a statistical term relating to
how predictable performance is over time. If the data are
stable, then you can be confident any change beyond a normal
range (as defined statistically) is due to your test elements.
You can assess stability by looking at how much historical data
falls within statistical “control limits,” versus showing
numerous outliers, trends, or other non-normal variation.
Measurement studies and stability analyses can
determine how trustworthy your data are.
Plan the process for data collection
Running a test or measuring advertising
effectiveness requires a sharper focus on the campaigns or test
units being tracked. If you are testing a print ad in one
magazine, you need to be able to separate responses to that one
ad from all other market inquiries. Usually, you need a unique
tracking number for each campaign or test cell you want to
analyze. Examples include:
- Catalog and direct mail: a unique source
code (keycode) on catalogs and reply cards, different 800
number, and unique URL for tracking response (otherwise, you
can try to match back all responses to the correct test
recipe, though this approach has some error)
- E-mail and Internet: a unique landing
page and URL tag on each version, to follows the visitor
throughout the purchase
- Print ads: a unique contact name, phone
number, or web address
- Mass media: run tests in certain regions
and compare regional sales to a historical baseline
- Retail: Collect sales data individually
for each store
Refine the scope of the testing project,
if necessary
After assessing the availability and
reliability of your data, you may need to change the scope of
your test. For example, a retail test may have to focus on
in-store product and promotional elements if local/regional
advertising cannot be measured accurately for each test recipe.
An advertising test may have to be run in a number of
metropolitan areas across the U.S., so the chance of overlap in
exposure is minimal. And if you want to increase long-term
customer profitability, you may need a much larger sample size
than you need if response rate is the key metric.
A large investment firm
wanted to test changes to their newspaper
advertisements. (a) Since the ads did not produce sales
directly, they selected phone responses and website
visits as a measure of advertising effectiveness. (b)
The company already had computerized system to track
different phone extensions and webpages. (c) Every ad
would have a unique phone number and website address
printed on the ad, so responses could be tracked back to
the correct ad. The team took 16 phone extensions out of
use to save for the test and created different landing
pages for each test recipe. (d) With just six editions
of the national newspaper, the test would run over a few
weeks with a “control” version plus five test recipes
running every week. With these restrictions, the team
decided to limit the test to, at most, about a dozen
test elements.
The next step is to leverage
statistical power
in the test design.
Back to top
|