Click for a Printer-friendly Version
- Adobe PDF
Enhance Lists with Overlay Data, Part 2
By Jim Wheaton
Principal, Wheaton Group
Original version of an article that appeared in the
September 8, 2003 issue of "DM News"
Part 1 of this article (DM News, April 7, 2003) showed how demographic
overlay data can improve your company's top and bottom lines.
However, it is more difficult than one might think to interpret
the profile reports that are generated by the overlay process.
There are several analytical traps that frequently snare the untrained.
This month's article will focus on how to properly evaluate profile reports.
It will also provide insight on how to determine the quality of
the ones that have been supplied by your vendor. Unfortunately,
many of the profile products on the market today have significant
This article is based on a real analysis that
was run for a specific automobile model. Because the analysis
was done over a decade ago, we are able to share one of the profile
tables (see Table 1). In this table, the buyers of Coupes
(2-doors) were profiled on Age of Head of Household, and then compared
with buyers of Sedans (4-doors) as well as a sample of the entire
United States. (Shading has been added to the profile table
The first three columns are straightforward. For each of fourteen Age
ranges, the quantity of Coupe buyers is provided, followed by each
range's percentage of the 62,492 Coupe buyers for which the Age
variable was available. A critical component is the breakout
of Coupe buyers for which Age was unavailable. This group
numbers 50,230, or fully 44.6% of the 112,722 overall Coupe buyers.
Unfortunately, a number of commercially available profile products
do not include this breakout.
As was explained in the April 7 article, individuals who cannot be coded with
a given data element are almost always demographically different
from those who can. This is because representation on major
overlay databases is skewed towards older individuals who reside
in more geographically stable households. Non-coded individuals
tend to be younger and more mobile. If this skew is not accounted
for, then the resulting profiles will be misleading. The next
two "Coupe Estimate" columns illustrate why this is true:
Generating Estimates for the Missing Component
In the first
column, an approximation algorithm was used to allocate, across
the fourteen Age ranges, the 44.6% of buyers for which no Age information
exists. This is a sophisticated process whose mechanics are
beyond the scope of this article. The resulting allocations
can be remarkably accurate, and are applicable to a number of overlay
data elements. Unfortunately, many commercially available
profile products do not include such estimates.
Compare the estimated Percent of Total column with the "actuals" displayed
in column three. For example, 18-24 year-olds account for
an estimated 24.6% of the 112,722 overall Coupe buyers. This
is dramatically higher than the "actual" 7.3% of the 62,492 Age-coded
buyers. Clearly, focusing on the 7.3% would very misleading
from a marketing perspective; and perhaps disastrously so!
The "actuals" for the 18-24 and 25-29 Age ranges reflect significant
under-representation. All other ranges reflect over-representation.
For example, 45-49 year olds have an "actual" of 16.4% versus an
estimate of 10.9%.
Coupe Versus Sedan Comparison
The next two columns compare
Coupe versus Sedan buyers. The Ratio of 205, for example,
means that the estimated 24.6% of Coupe buyers who are 18-24 is
over twice that of Sedan buyers. Coupe buyers display higher
concentrations among 18-29 and 45-54 year olds, as is apparent by
the fact that the corresponding Ratios are over 100. Clearly,
the Coupe appeals to two very different life-stages.
Interest in Coupes versus Sedans is highest at 18-24, with a Ratio
of 205, and then consistently declines to a Ratio of 77 at 35-39.
Then, there is increasing interest to 45-49, where a secondary "peak"
occurs at 123. Next, there is a long decline to 75-79, where
the primary "valley" occurs at 50. Finally, there
appears to be a small up-tick beginning with 80-84 year-olds, although
the corresponding Confidence statistics suggest that this is not
clear-cut. (More on this later.)
From a life-style perspective, all of this makes good sense.
The young, who often are single and have no children, find 2-door
vehicles to be more appealing than 4-doors. So, too, do 45-54
year olds, who purchase the Coupes as secondary vehicles for themselves
or their teenage children. There is lower interest in Coupes
during the prime child-rearing years, when a premium is placed on
4-door practicality. Interest also declines during the senior
years, because the 2-door design can be difficult to enter and exit.
Ratios are particularly important in the absence of an approximation
algorithm to adjust for missing data and generate estimated Percent
of Totals. This is because unadjusted "actuals,"
although suspect from an absolute perspective, can be compared with
each other to create valid relative metrics. For example,
consider again that it is misleading to conclude that 7.3% of overall
Coupe buyers are 18-24. However, even in the absence of the
adjustment to 24.6%, we can be certain that the concentration of
Coupe buyers in the 18-24 Age range is twice that of Sedan buyers.
The second column in this "comparison against Sedan" section is a Confidence
statistic. This is based on something called a Z-Score, which
is has been omitted from the table for the sake of simplicity.
The Confidence statistic is frequently misunderstood. Consider,
for example, the 99% Confidence that is associated with the 18-24
Age range. It does not translate to 99% Confidence that
the Coupe-to-Sedan Ratio is exactly 205. Instead, it quantifies
the likelihood that the Percent of Total for Coupe buyers is statistically
different than for Sedan buyers.
Confidence is a function of the similarity between the Percent of
Total Coupe and Percent of Total Sedan buyers, and of their corresponding
sample sizes. By definition, a Ratio of 100 will have a 0%
Confidence, regardless of the sample size. Notice that the
40-44 Age range has a Confidence of only 45%. The primary
reason is that its Ratio of 98 is very close to 100. Conversely,
the primary driver for the low Confidences associated with the 80-84
and 85+ ranges is small sample size. Therefore, the apparent
up-tick in their Ratios might be nothing more than a statistical
Coupe Versus U.S. Comparison
The final column compares the
Age distributions of Coupe buyers versus a sample of the U.S. population.
Coupe buyers display higher Ratios among 18-29 and 40-59 year olds.
The peaks and valleys for the U.S.-derived ratios are often more
extreme than for the Sedan-derived. (The Confidence column
has been deleted for simplicity. All Age ranges in this example
displayed Confidences at or close to 99% because a very large sample
of the U.S. population was employed.)
Demographic overlay data can improve your company's
top and bottom lines. However, without robust profile reports,
and their knowledgeable interpretation, it is easy to draw inaccurate
marketing conclusions. Be particularly mindful of the effects
of missing data, and the corresponding need to accurately adjust
profile reports. (For additional reading on this topic, see
"Individual/Household Demographics & Psychographics: Applications
in Descriptive & Predictive Research," The Direct Marketing
Association's 1997 Research Council Journal, www.wheatongroup.com.)
Jim Wheaton is a Principal at Wheaton Group, which specializes in direct
marketing consulting and data mining, data quality assessment and
assurance, and the delivery of cost-effective data warehouses and
marts. He is also co-founder of Data University. Jim
can be reached at 919-969-8859, or firstname.lastname@example.org.