Click for a Printer-friendly Version
- Adobe PDF
Evaluating Merge/Purge Systems:
By Jim Wheaton and Cynthia Baughan Wheaton
Principals, Wheaton Group
Original version of an article appeared in the December 1987 issue
of "Direct Magazine"
[Note: Despite dramatic increases in raw computing power
and a proliferation of end-user software tools since the
publication of this series of six articles, virtually all of the
content remains highly relevant. The occasional obsolete
point is highlighted.]
In a series of six articles, we explain a number of the key concepts
that mailers should understand about merge/purge, as well as reviewing
(in the first article) a methodology that could be helpful in evaluating
the effectiveness of either present or prospective merge/purge systems.
While our comments are primarily addressed to mailers, merge/purge
vendors can benefit by measuring themselves against the criteria
that we have identified as important.
Our objective is to describe new and specific tools that can be
used to evaluate and improve the performance of the merge/purge
process. Through commentary and examples, we will attempt
to translate into layman's terms the technical jargon that
baffles many mailers. In the process, practical applications
should become apparent.
This month's article, Part Six, focuses on the four methods
of handling outside list duplicates, as well as suggested ways of
monitoring the performance of duplicates that are mailed multiple
Methods of Handling Outside List Duplicates
process divides the gross names input into three primary groups:
- Duplicates to house or other suppression lists.
- Rented list duplicates, either between multiple lists (inter-list
duplicates) or within a single list (intra-list duplicates).
- Unique net names available to mail.
The disposition of suppression
list hits is easy. They should not be mailed. Rented
list duplicates are not so straightforward. They can be handled
in one of four basic ways:
- Random allocation.
- Hierarchical allocation.
- Random allocation, within a hierarchy.
- Separation of duplicates.
Although all four will be discussed, the third method, random/hierarchy,
is recommended for most established direct mail companies.
Here, duplicates are allocated on a random
basis among the specific lists that contributed them.
- If, for example, List A had a 1,000-name overlap with List B, each
list would be credited with 500 of these duplicates.
- If List C, D and E had 300 names in common, each would receive credit
for 100 names.
Random allocation is particularly appropriate in a
start-up mode, where all lists are untested and there is no existing
reason to assign priority.
Exhibit 1 illustrates the impact on response reports of random allocation.
For purposes of simplification, names are assumed to appear on
no other lists but the ones cited in the examples and attachments.
In this case, lists are prioritized
for duplicate allocation. Whenever a name overlap appears
between two or more lists, the highest-ranked list is credited with
the duplicate. Assume, for example, a hierarchy of A, B, C,
- List A would receive credit for all names that overlap with —
say — A, B and C.
- List C would be credited all names that appear on — say —
C, D and E.
The frequent motivation behind the hierarchical allocation
of duplicates to outside lists is the minimization of list rental
- Where, for example, List A has a rental charge of $50 per thousand
and List B a fee of $100 per thousand, and each is rented on a net
basis, it is economical for List A to receive the entire overlap.
- Many list owners charge on a gross name basis for first-time tests
of small quantities (i.e., 5,000 to 10,000). Where, for example,
List C is rented on a gross name basis and List D on a net name
basis, and the quoted cost per thousand is identical, total charges
are minimized when all duplicates are allocated to List C.
with this approach is that there is no reason intrinsic to the nature
of the lists for a hierarchical setup. The by-product is distorted
response reports and incorrect rollout decisions, which can more
than offset the savings in list rental charges.
Exhibit 1 illustrates the impact on response reports of hierarchical
Random Allocation Within A Hierarchy
This approach, a permutation
of the first two, combines multiple lists into hierarchical groups.
Intra-group duplicates, however, are allocated randomly. Assume,
for example, a list hierarchy of A/B, C/D/E:
- List A would be credited with all names appearing on lists A, C,
D and E.
- Names common to Lists A, B, C, D and E, however, would be allocated
equally among A and B.
- Likewise, names appearing only on Lists C, D and E would be allocated
equally among the three.
This is a rational approach for established
firms, where outside lists can be divided into two groups:
proven lists and untested/unproven lists.
- The proven lists are combined into a primary hierarchy group, and
the untested lists into a second, lower-priority group.
- For both, intra-group duplicates are allocated randomly.
- All overlap between the proven and the untested groups are allocated
to the proven group. The untested group is credited only with
incremental names not appearing in the proven list group.
The underlying philosophy is that the proven lists would be mailed
anyway. An unproven list is considered successful only to
the extent that it makes a net contribution to the universe of economically
This approach has the advantage of response reports
that reflect the incremental value of new lists; a base from which
rational rollout decisions can be made. Although it does not
minimize rental charges, it is recommended for most established
- We believe that accurate rollout decisions far outweigh the associated
incremental rental charges.
- These charges are small relative to total rental costs. Total
rental costs, in turn, are a minor portion of total in-the-mail-costs.
1 illustrates the impact on response reports of random allocation
within a hierarchy.
Separation of Duplicates
Some mailers separate duplicates
entirely from the underlying lists, treating them as if they were
a discrete rental list of multi-buyers.
- The disadvantage is that individual list performance reflects only
single-buyer responses, and is therefore understated.
- The level of understatement is positively related to the percentage
of duplicates. Understatement is most pronounced in lists
with a high percentage of duplicates, which are likely to be your
- The percentage of duplicates almost always varies, which means the
level of understatement will be different across lists.
- With different levels of understatement, comparison of results becomes
Exhibit 1 illustrates the impact on response reports of
Comparison of Various Duplicate Allocation Techniques on Reported
The List Matrix Report
Regardless of the allocation method
employed, merge/purge results should be summarized in a List Matrix
Report, as follows:
- Gross input quantities, by list.
- The absolute and percent overlap of each list with all other lists.
- Total duplicates "credited" to each list.
- The resulting net quantities per list.
Multiple Mailings of Duplicates
The following outlines how
an established direct mail company could monitor the performance
of duplicates that are mailed multiple times:
- Allocate according to the rules outlined in the random-allocation-within-a-hierarchy
- Append duplicate and mail drop suffixes to the main source code.
Two-time duplicates in the first mail drop, for example, would receive
a " —21," three-time duplicates in the second drop
a " —32," etc.
- For each mail drop, track first by main source code and second by
- Response reports for each drop should be broken down in an identical
manner, with main source code results separated from duplicates
results in order to prevent double-counting.
- Ideally, duplicates response for each drop would be tracked by
list (A-21, A-31·B-21, B-31·etc.), but in reality this may prove
unwieldy. At a minimum, results should be aggregated by
overall suffix (2X, 3X, etc.).
Ethically, a duplicate should be mailed multiple times only when
each list owner has given prior consent, agrees to subsequent mail
dates for multi-buyers, and has been compensated in full for the
use of the name.
- Regardless, duplicates should never be mailed more times than they
appear on outside lists (twice for 2X duplicates, three times for
3X, etc.) without the list owner receiving additional compensation.
- Net name rebates should not be expected when duplicates are mailed
the same number of times they appear on outside lists.
illustrates how to track and report multiple mailings to duplicates.
Format Outline for Tracking Multiple Mailings to Duplicates
What to Look for In Evaluating Merge/Purge Systems
the past six articles, we have given you a great deal of specific
information about the merge/purge process. Here, however,
is the primary thing to remember:
Many direct marketing professionals have limited technical knowledge
of the merge/purge process. This lack of knowledge is often
compounded by an unwillingness or inability to communicate needs
on anything but the vaguest of levels. This is known as the
"black box" mentality. Or, some are quite involved
in the initial selection of a vendor, and then move on to other
matters, ignoring future merge/purge issues or needs.
These same professionals, however, devote much time and effort to
the other aspects of a mailing. We hear much discussion of
ZIP overlays, file segmentation, and list testing. We do not
want to imply that these are not important, because they are.
But, it is also important to realize that similar attention to the
merge/purge process can result in significant improvement in the
quality of names mailed and the cost of doing so.
Jim Wheaton and Cynthia Baughan Wheaton are Principals at Wheaton
Group, and can be reached at 919-969-8859 or email@example.com.
The firm specializes in direct marketing consulting and data mining,
data quality assessment and assurance, and the delivery of cost-effective
data warehouses and marts. Jim is also a Co-Founder of Data