Click for a Printer-friendly Version - Adobe PDF
Multiple Modeling Tools:
Key to Sophisticated Database Marketing
By Jim Wheaton
Principal, Wheaton Group
Original Version of an article that appeared in the
September 23, 1996 issue of "DM News"
Last month's "Superiority of Tree Analysis Over RFM: How It Enhances
Regression" discussed some important ways that tree analysis (CHAID, CART, and
the like) can be used to segment prospect and customer files. We also
showed how tree analysis can be valuable for companies that employ regression
as the primary tool for predictive modeling, because of its ability to identify
what are known as interaction predictor variables - the synergy in explanatory
power that often results when multiple variables are combined.
We'll concentrate this month on another way that tree analysis can enhance
regression models - as a vehicle for providing insight on how to tailor the
promotional message to the characteristics of a given prospect or
customer. In this approach, the regression model determines whom to
promote. Then, the tree analysis provides insight into what the
promotional message should be.
This two-step segmentation strategy is central to many state-of-the-art
database marketing programs. To understand how, we need to first
understand a basic difference between regression and tree analysis. We'll
use one of last month's hypothetical examples to illustrate this difference:
Assume that we want to segment a customer file by response to a previous
mailing. We'll start with a regression model to determine whom to
promote. To keep matters simple, assume that the model contains the
following six variable types: Months Since Last Order, Number of Orders,
Average Order Size, Merchandise Categories, Age and Gender.
The output of our regression is deciles - ten groups of equal size.
Decile 1 is comprised of the top ten percent of customers in terms of predicted
future purchase probability, and Decile 10 the bottom ten percent. (Models can
be comprised of any number of equal groups, depending on the
circumstances.) Every customer who is scored by the model is evaluated on
each and every one of the six variable types - unlike tree analysis, where
multiple customer (or prospect) groups are defined by different subsets of
variables. The model predicts future purchase probability as follows:
The Gender variable might apply points for women and none for men.
Likewise, the Age variable might apply more points for older customers and
fewer for younger ones. Every customer is interrogated for all six of the
variable types. The higher the sum of the points across all of the
variables, the greater the predicted probability of responding in the future.
Because of these dynamics, regression deciles are heterogeneous;
that is, a given decile contains customers with multiple characteristics.
Decile 1, for example, will likely contain men and women, as well
as the old and the young. This is because an extremely high
"rating" on a given variable can more than counteract a relatively
low "rating" on another variable. In fact, the inhabitants
of a given regression decile have only one guaranteed similarity:
their predicted future purchase probability.
What all of this means - again - is that regression is an excellent technique
for determining whom to promote. But, because there is no guaranteed
consistency of customer characteristics within a given decile, regression
provides little insight on how to tailor the promotional message to the
interests of each individual.
Tree analysis provides excellent insight into the appropriate promotional
message for a given customer or prospect because it creates homogeneous groups;
that is, a given decile contains individuals with identical
characteristics. The following are two of the groups described in last
month's hypothetical example (along with their higher-than-average response
40-50 year old female jewelry buyers, with four or more purchases, averaging
$500+, and at least one purchase within the past six months - 8% response rate.
30-35 year old male electronics buyers, with three or more purchases, and at
least one within the past twelve months - 4% response rate.
The beauty of these two tree analysis groups is that - unlike regression
deciles - they provide us with marketing insight. Feed these descriptions
to a good creative staff and the result will be a brainstorming session on how
to tailor the promotional piece to the demographic characteristics and product
interests of each customer group.
There are many ways to tailor the promotional piece. Perhaps the simplest
are ink-jet messaging and "blow-ins." More involved techniques are
over-wraps, customized covers, and "glue-ins." The ones that require the
most up-front investment are selective binding and specialty promotions.
Sophisticated database marketers employ at least some of these techniques for
just about every promotion.
Exactly how should tree analysis be combined with regression? Part of the
answer is straightforward: build a regression model to determine whom to
contact. The rest of the answer, however, requires some creativity.
One possibility is to build a tree analysis in which the potential predictor
variables are limited to demographics and merchandise categories. These
are the variables that are most likely to offer clues on how to match the
appropriate promotional message to a given customer or prospect. Other
variables - recency, frequency, average order size, and the like - are helpful
for identifying those individuals who are most likely to respond. But
they provide little insight into lifestyle or interests.
In order to implement this selection strategy within a production environment,
a customer or prospect file is first scored by the regression model to
determine whom to promote. Then, a series of "and" statements -
corresponding to the homogeneous tree analysis groups - is applied to those
individuals who have been selected for promotion. Finally, an appropriate
promotional message is targeted to each group.
To summarize, a regression model is just the first step in a state-of-the-art
database marketing program. Other analytical techniques, such as tree
analysis, must be included to determine how to tailor the promotional message
to the needs of the individual.
Jim Wheaton is a Principal at Wheaton Group, and can be reached at 919-969-8859
or email@example.com. The firm specializes in direct marketing
consulting and data mining, data quality assessment and assurance, and the
delivery of cost-effective data warehouses and marts. Jim is also a
Co-Founder of Data University www.datauniversity.org.