Sins of the Fathers
by Jerry W. Thomas
The Fathers of Marketing Research invented a number of extremely powerful and valuable tools, methods, questions, and concepts that we all use and benefit from every single day.We are indebted to their originality, inventiveness, and pioneering genius that founded and shaped our industry and its culture. Much of this founding work took place during the 1920s through the 1960s, and some of the research inventions occurred during the 1970s through the 1990s. But no one is perfect, and our industry Fathers committed sins that blight our industry to this day.
The first great sin of the Fathers is the Top-2-Box percentage. Somewhere along the way, a founding Father developed the Top-2-Box concept for questions with multiple positive responses.
A good example is the 5-point Purchase Intent scale: Definitely Buy; Probably Buy; Might or Might Not Buy; Probably Not Buy; Definitely Not Buy. If only the Definitely Buy answers are counted, the Fathers reasoned, information is lost. What about the Probably Buy answers—shouldn’t they be counted too? Hence, the Top-2-Box solution came into being, and the custom is to present the Definitely Buy percentage, followed by the Top-2-Box percentage (Definitely Buy plus Probably Buy). Sounds perfectly reasonable, so where is the sin and shame?
The Top-2-Box percentage counts a Definitely Buy the same (i.e., gives it the same weight) as a Probably Buy, when it's blatantly obvious to everyone that a Probably Buy is not nearly as good as a Definitely Buy answer. For the 5-point purchase scale above, the sin of counting a Definitely and Probably as equals is, no doubt, a cardinal sin. If we were working with a 9-point, 10-point, or 11-point scale, the Top-2-Box percentage might only be a minor transgression. That is, on a longer scale, the difference in meaning between a Top Box and the Second Box is relatively small, so no great harm in adding the two together. On shorter scales, however, the distortion (and the sin) is usually much greater.
Back to the 5-point Purchase Intent scale. A better solution is to count all of the Definitely Buys and then discount the Probably Buys by 40%, or 50%, or 60%, and add the Definitely Buys to the discounted (or down-weighted) Probably Buys, creating a weighted average that provides a more accurate measure of the results. For example, if the Definitely Buy answers equaled 32% of respondents, and the Probably Buy answers equaled 20% of respondents, a best practice is to count all of the Top Box (the 32% who said Definitely Buy) and let’s say 50% of the Second Box (the 20% who said Probably Buy). That yields a Purchase Intent Score of 42 (32% plus half of 20%). The result is called a Score (not a percent) since we have created a hybrid number.
The second great sin of our industry Fathers involves significance testing. There is little doubt that significance testing of critical "decision" statistics is valuable. For example, determining whether Product Blue is better than Product Red is a good application of significance testing. In the beginning, significance tests had to be calculated by hand, so only the most important results were subjected to significance testing. But, the growing power of computers and the expanding availability of statistical software led to the automation of significance testing in cross-tabulation tables. Thus, with a few programming scripts, thousands of significance tests could be automatically run on a set of cross-tabs. You could easily test rows of percentages against the adjoining rows, or test column A against column B. You could even determine if the differences between statistics in rows and/or columns were significant at the 90%, 95%, or 99% level—with some type of code letters, symbols, or colors. The resulting significance assertions could then be incorporated easily into charts, graphs, and written reports.
Some might hail the exhaustive use of significance testing as a great advance in our craft. However, I would argue that willy-nilly significance testing is a great waste of time and effort. Overuse of significance testing adds costs to the preparation of written reports, adds extra time in quality-assurance verification, and actually increases the risks of errors in interpreting the survey results. If every number in a set of tables or a report is significance tested, the analyst might avoid looking at the non-statistically significant results and, thus overlook important findings and patterns in the data. If the analyst is overly focused on statistical significance, he or she often overlooks other types of significance or other signals in the data.
Mass use of significance testing adds a hodgepodge of confusing symbols and potential bias into survey results. Also, many of the “significantly different” indications will be false, based purely on chance variation. I have personally watched analysts overlook almost everything of importance in survey results, because they were so focused on statistical significance that they were blind to everything else. A best practice is to use significance testing only on the one or two most important questions in the survey data.
The third great sin of our Fathers comes from Type 1 and Type II Error in hypothesis testing. You can easily argue that the Fathers of the research industry stole Type I error and Type II error from the statistics or the academic world (and should, therefore, be blameless), but why on earth would our industry Fathers steal something as confusing as Type I error and Type II error? Couldn't they steal something that's more useful? Can anyone remember which is which (false positive versus false negative) and exactly what the heck Type I and Type II mean? Maybe I'm just old and over-the-hill, but I have to do a Google search and study Type I and Type II error before ever attempting to actually use these concepts. And, why are we only focusing on errors, and not on truths? If there are two types of error (false positive and false negative), then there must be two types of truth (True positive, and True negative), or are there more than two types of error and more than two types of truth? My head hurts. Let's move on.
The fourth great sin is the so-called Semantic Differential Scale. It was no doubt stolen from the psychology or sociology world, but again, why did our Fathers not have better judgment? Now, I'm not against stealing if you can do it in the dark of night and if it's profitable, but I am firmly opposed to dumb stealing. Semantic Differentials are usually some type of numeric scale (5 points, 7 points, 9 points, 10 or 11 points) with the endpoints anchored by two words with opposite meanings, such as Love/Hate, Fast/Slow, Modern/Old-Fashioned, and so on. The two words with opposite meanings are okay. It's the long number scale in between that bothers me. What the heck does a 7.3 mean on a 10-point scale, or what does a 3.8 mean? A better practice is short scales (true-false, yes-no, excellent-good-fair-poor, and so forth), where each answer on the scale means something. It is much easier to explain true-false or yes-no answers to high-level executives than to explain 7.2 on an 11-point scale. In general, the higher the executive’s level, the shorter and simpler the research results must be; and that is where the simple, short answer scales are at their very best. The older I get, the shorter my answer choices become.
Our research Fathers did not stop with the aforementioned sins, but I do not wish to punish their collective reputations any further—since I’m one of them. They were a well-intentioned, studious lot, and the useful tools they handed down to the current generation surely counterbalance some of their sins.
About the Author
Jerry W. Thomas is President/CEO of Decision Analyst, and he welcomes feedback and comments. He may be reached by email, or phone at 1-817-640-6166.
Copyright © 2021 by Decision Analyst, Inc.
This posting may not be copied, published, or used in any way without written permission of Decision Analyst.