HomeComapnyClientsLibraryPressContactDM Links

Click for a Printer-friendly Version - Adobe PDF

The Hype and the Reality of Database Marketing Software

By Boris Gendelev
Principal, Wheaton Group

Original version of an article appeared in the September 12, 1994 issue of "DM News"

[Note:  Despite dramatic increases in raw computing power and a proliferation of end-user software tools since the publication of this article, virtually all of the content remains highly relevant.]

You've heard the pitch:  for you, the seeker of new database marketing heights, there is a software package that will take you there.  With a click of a mouse, you will count and profile your customers, select your names and, when you are done mailing, analyze your results.  Wow!

But after the initial thrill is over, you might be disappointed for one or more reasons: 

  • Data manipulation capability is not powerful enough.
  • Data-driven modeling is supported only superficially.
  • No training is provided for navigating in the dangerous waters of data analysis.
  • Nobody addressed the issue of data integrity.
  • Simple queries may be simple, but complicated ones are nearly impossible.
  • Even if you manage to express your complicated question in the language of the package, to get an answer in a reasonable amount of time takes expensive and exotic hardware. 

Of course, nothing can be perfect.  But to minimize disappointment, ask the fundamental questions.

Is Important Functionality Missing?
To a direct marketer, automated counts and selects are very important.  For a database marketer, that alone would not do.

Database marketing is first and foremost the process of using customer history to predict, under different scenarios, future productivity.  So defined, the practice of database marketing centers on understanding the relationship between what was known about a customer at one point of time and subsequent results.   

Because database marketing is about data-driven models of customer behavior, counting and profiling only qualifies as the first stage of the modeling process — getting familiar with a business and its data.

Marketers new to database marketing are likely to stop at counts and profiles.  Does this mean they do not use models of customer behavior?  Of course they do — their models are judgmental, not data-driven.

Judgmental models reflect one's intuitions, the subconscious sum of one's experiences.  There is nothing wrong with that.  Yet, when there is hard data to give or deny credence to a hunch, it makes business sense to use it.  Software that doesn't go much beyond counts and profiles doesn't unlock the full potential of database marketing.

Can a Software Package Really Build Models?
Many packages claim to perform modeling — feed it your mailing results and after cranking away with logistic regression, neural network, fractals or some other statistical wizardry out pops the result.

But all they are offering you is model calibration, the calculation of parameters.  Who decides what variables to toss into the magician's hat?  You do!  But how?  By intuition alone?  Is that data-driven?  No!  Is that true databased marketing?  No!   

The process of modeling is foremost the process of deciding, by business and data analysis, what data to use and how to transform it to illuminate patterns: 

  • Modeling subset — What time frames and business segments are relevant?  Last fall?  This spring?  All spring seasons?  Last five years?  General media?  Specialty media?
  • Dependent variables — What should you try to predict and how it should be measured?  Response Rate?  Average Order?  Demand per media?  Per marketing dollar?  Return Rate?  Net sales?  Long term value?
  • Independent variables — What variables should be tried as predictor variables?  Here, the list of possibilities is endless, considering combinations of variables (differences, sums, ratios, percentages).  The creating and testing of predictors is where interesting data analysis takes place.

Once past the first round of these questions, you will be ready to calibrate a model.  Then you have to validating its performance.  The results might suggest fine-tuning and send you back to data analysis in search of new ideas.

Model calibration does not develop new concepts.  You have to.  Software that restricts you to variables you were foresighted enough to record during your mailing select is a package that hinders the practice of database marketing.

Will a "101" Standard Report Be Enough?
It might well be that the bulk of marketers' daily needs can be satisfied with a stack of standard reports and "fill-in the blanks" queries.  Yet, just as surely, the remainder — reports produced ad hoc in search of understanding of changes from business as usual — are what provide your company with a competitive edge.

The standard reports help you monitor the business through the lenses of your existing models (intuitive and statistical) as well as monitor the robustness of the models.  They serve to trigger new questions.  The ad hoc, never-anticipated queries help answer the questions and move both the models and your business forward.

Therefore, while prepackaged report templates are often useful, good database marketing software should excel in ad hoc reporting of any depth and complexity.  If the answer can be found in the data, the tool should give you power to formulate the question.

How Do You Avoid Discovering Useless Things?
The barrier to building robust actionable customer behavior models cannot be overcome by software alone.  Data analysis expertise is equally essential.

The data, not the software, interacting with an analyst's logical faculties and imagination, drive the course of analysis.  The decisions based on the analysis, once set in motion, may have profound, irreversible and long lasting impact on your business.  Training and experience in data analysis and interpretation is what stands between you and disaster.

The modeling process is full of potential pitfalls.  Two examples are:

Example #1:  In the exploratory phase of modeling, there is a danger of selecting predictor variables that are contaminated by the dependent variable. 

Suppose you have a hunch that customers with children are your better buyers.  You decide to add a question to your order entry script and a new field — "presence of children" — to your database.  The field is initialized to "no."  After a while, you start analyzing if those who answered "yes" bought more frequently.  And sure enough, "yes" customers are more frequent buyers than "no"' customers. 

Did you just find an important key to you business?  Before you start        paying for demographic overlays and over-circulating households with children, consider this:  those who buy more frequently were more likely to be asked and give an answer.  Therefore, being a frequent buyer makes a "yes" more likely, and not necessarily the other way around. 

The specific lesson:  a single code should not mean two different things.  In this example, "unknown" should be a separate code.  Moreover, a new segmentation variable must be evaluated while holding constant other variables already known to be good segmenters; for example, RFM or your current scoring model.

A more general lesson:  without being keenly aware of how your business is reflected in the imperfect mirror of your data, and how to evaluate an incremental value of a new idea, it is easy to "discover" useless things.         

Example #2:  In the calibration phase of modeling, exposing the statistical procedure to the "best" cross section of data is complicated. 

Techniques that produce black boxes are particularly troublesome.  A black box might fit the data used to build it, but as your business evolves and produces previously rare combinations of variable values the black box may start to spew out nonsense.

Suppose your new software presides over a newly built database.  While your analyst knows to be careful due to having access to only eighteen months of data, did anyone tell the software?  How do you tell the software that some customers are three, four, or ten years old and their history is partial?  Will there be problems a year from now when the maximum recency is thirty months?

Neural net techniques are particularly vulnerable, because in addition to producing inscrutable black boxes, they demand a trade-off between computing power and sample sizes.  Without specialized add-ons, several thousand cases may be the sample limit.  For a good size business, this might be only several percent of the buyer file.  Samples are likely not to contain the full range of variable values and their combinations.

True, better packages use sophisticated mathematics in order to avoid overfitting.  For example, they could insure that higher frequency of buying consumables always leads to a higher score.  A neural network can easily produce a result that contradicts this common sense reality and risks losing you sales.

A capable package without expert data analysts to train you and to help you use it is at best an incomplete, and at worst a dangerous, solution.

Who is Verifying What and How?
As previously discussed, it is easy to be misled by poorly conceived and carelessly executed data analysis.  But even full awareness of the business, data and analytical issues does not protect you from simple mistakes.  Because you will have to translate your thoughts into organized instructions a computer can execute, there is plenty of room for error.

The art of result verification is a specialized branch of data analysis.  Multi-million dollar mistakes await those who believe that computers print nothing but gospel.  Certain software features may prove helpful to minimize the occurrence of error, but again, training is the real answer.

Who Minds Data Integrity?
A database marketing package without data is an empty shell.  With data that is inaccurate, incomplete or inconsistent, it is a ticking bomb. 

In checking data integrity, there is no avoiding a human expert.  But beware!  Often, people who can make sense of system and file structure do not have a clue how to analyze and interpret data.  Assuring data integrity is a data analysis task not a programming job.

Should you believe MIS when they say the data is clean and all you need to do is load the files?  If they are not familiar with the tools and techniques needed to answer your business questions, how were they able to query the integrity of the data?  How much has been learned from an audit of the data?  If many basic facts about the business await the running of the "standard" reports, then on what basis did MIS reach its clean bill of health.

When it comes to construction of a database with accurate, complete and consistent data and, just as importantly, development of a process that maintains that integrity, it is hard to imagine a cookie cutter solution.  Nor is it advisable to entrust anybody other than business-aware, computer-literate, seasoned data analysts with the task.

How does the Interface Deal with Query Complexity? 
GUI's (graphical user interfaces) can be wonderful.  They allow you to construct queries by pointing and clicking instead of typing.  And for a two-finger typist, it is a relief.  The problem is that most GUI's make it easier for you to construct 4GL (4th generation language) statements, but still require you to know the technical concepts of their system.  They offer step automation by demanding less effort for mindless steps.  You would rather the interface achieve a conceptual shift — elimination of whole groups of steps that, if it were not for the need to spoon feed the computer, would not even be a part of the way you think.

Electronic ignition instead of a crank saves muscle energy.  However, you still have to start the engine; you cannot just jump in and step on the gas.  Entering a query by dragging and dropping saves keystrokes, but does not eliminate the responsibility for learning the effect of what you are dragging and where you are dropping.

A conceptual shift is vastly more powerful.  It is automation or even the complete hiding of a whole set of steps that together are conceptualized as a single task.  It saves not only physical but also mental energy.  Manual vs. automatic transmission is a good example.  With an automatic transmission, you make choices from a limited, high-level set of options.  You do not need to know that there is such a thing as transmission, let alone what gear is appropriate for what speed.  A box that decides without your involvement of when, and how, to shift leaves more human brain neurons focused on the business of avoiding other automobiles and getting to your destination.

Marketing data analysis, arguably one of the more complex computer-assisted endeavors, is challenging enough in its own right.  The more the software allows you to "show" or "lead" it to a desired final result, in a manner natural to you, the better.  Approaches that have you tell the software how to apply its own (foreign to you) operators lengthen the learning curve, increase the likelihood for mistakes and  rework, and reduce the time you have to apply your brain to more creative activities.

Are You Spending Too Much on Hardware?
The real question should be:  "Is the software optimized for the task to get the most out of the hardware?"  Only if the answer is "yes" can you make intelligent trade-offs in cost vs. performance.

Unless you like expensive big iron, steer clear of the "state-of-the-art" in database technology — relational database management systems (RDBMS).  On many levels the RDBMS model is a poor fit for database marketing.  Inherently, RDBMS's are extremely inefficient for large numbers of complex calculations, processing of many linkages between records (joins, in RDBMS terminology), and massive aggregations (projections).

True, indexing can speed up certain kinds of joins and projections — those that are done on indexed fields.  But indexing all of the fields is contingent on having lots of disk space.  More importantly, the fields you may want to use in a join or a projection are probably calculated on the fly.  They represent new concepts you want to try, or expressions that evolved along with your business.  How could you index the file on such variables?  No one should expect you to!

Indexes are useful for quick access to predefined aspects of the database much like you would use a library index.  But, if your goal is to describe how form and content affect a book's popularity, a library index of title, author and subject would not be helpful.  You would need to start reading books and their reviews and summarize your findings in ways that might be useful for cajoling the patterns.  As you generate new hypotheses, you go back and reread the whole pile.                                   

If a software vendor touts indexing capabilities, remember that you are in the business of reading all the books and related materials.  The books better be arranged in a way that makes it easy to pick up one as well as related material, study them, add its gist to all of your different summaries, and move on to the next.  Be prepared to go through this process over and over again.  The number of times you have to get up and search for something on the shelf (disk access, particularly random access) should be cut to the minimum and your work should all be in one quick access working area (RAM). 

With properly optimized software, a Pentium PC can produce performance on par with speeds claimed by some software vendors to be "incredibly fast" on hardware that costs 20 times more.  When possible, spend your money on data analysis and modeling, not on hardware.

Was the Testing Ground the Same as Your Battle Ground?
Look at the past and present clientele of the software vendor.  Ask which businesses were used as test beds for prototyping and stress testing the software.  If the industries or companies mentioned are not known for their database marketing expertise and use of accountable marketing media, what makes you believe the software is appropriate for you?

Are These Questions in Order of Priority?
The questions above are applicable to most potential users of database marketing software.  But specific circumstance of your business will dictate just how important each question is and generate other specific questions.  The important thing is not to be overtaken by hype.

Boris Gendelev is a Principal at Wheaton Group, and can be reached at 847-205-0916 or boris.gendelev@wheatongroup.com.   The firm specializes in direct marketing consulting and data mining, data quality assessment and assurance, and the delivery of cost-effective data warehouses and marts.

Top >>

Search Wheaton Group Published Articles

Legal PolicySite MapContact Us

Copyright © 2004 Wheaton Group LLC. All rights reserved.