3 Avoidable Statistical Mistakes
by Audrey Guinn, Ph.D.
Marketing research is grounded in the scientific method: answering questions by generating a priori hypotheses, collecting data to test hypotheses, and analyzing data to draw conclusions. Adhering to the rules of the scientific method is important to ensure that results are valid and unbiased.Sometimes marketing researchers are tempted to use undesirable methods, like conducting many single significance tests, performing statistical tests without hypotheses, and rerunning statistical tests until desired results are discovered. Unfortunately, engaging in these methods has unintended, detrimental consequences: namely, an increase in Type I Error.
An a priori hypothesis is a hypothesis that is generated prior to the research study taking place.
What is Type I Error? Type I Error is equivalent to a false positive. It occurs when we believe we have found a significant difference when there isn’t one. Typically, researchers set Alpha (α) to .05, limiting the chance of making a Type I Error to 5%. For a full discussion on Type I and Type II Errors, please see the blog posts written by Beth Horn here and here.
Often, researchers do not realize that Type I Error is influenced by the number of statistical tests conducted. Specifically, the probability of making a Type I Error increases with every additional statistical test (correlation, regression, t-tests, etc.) that is performed. This is known as the Familywise Error Rate. For example, if we conduct one significance test, we have a 5% chance of finding a significant difference when there isn’t one—because we have set α = .05. If we conduct three significance tests with an alpha equal to .05, the probability that we will make at least one Type I Error increases to 14.3%1. If we examine ten significance tests, the probability that we will find a significant difference when there isn’t one rises to 40%. If you are interested in reading more on this topic, please see the book Discovering Statistics Using IBM SPSS Statistics, 4th Edition (pg. 68), written by Andy Field.
Discussed below are three types of statistical mistakes that lead to an increase in Type I Error.
Mistake 1: Performing statistical tests without having a priori hypotheses.Creating hypotheses before survey construction is imperative to narrow down the scope of the project and to guide the survey construction and data collection phases. Having well-thought-out hypotheses helps researchers avoid going down the rabbit hole of performing many meaningless significance tests. This limits the number of tests conducted to only those that will answer the hypotheses, thus lowering the risk of claiming false effects that are due to increased Type I Error.
Mistake 2: Examining many singular statistical tests instead of one statistical model.Often, businesses need to examine many attributes, like claims, messaging, or product features, and test their influence on another variable such as loyalty, share, or consideration. When there are many attributes to be tested, it is best to factor analyze them. Factor analysis allows the researcher to reduce the number of variables to be tested and groups the attributes into smaller, cohesive subsets that are represented by unobserved, latent variables (factors). This newly structured data is then entered into one model which can analyze all relationships simultaneously (a structural equation model). This method is advantageous due to the reduction of the total number of tests conducted, the ability to model all relationships simultaneously, the minimization of multicollinearity among observed variables, and the correction for error which results in more accurate findings.
Mistake 3: Rerunning analyses until expectations or desires are met.Finding results that are contrary to expectations, or “not what the client wanted,” does not give the researcher permission to dredge the data in hopes of picking out a statistical “golden nugget.” This practice also results in an unnecessary increase in Type I Error, which can lead to reporting effects that are not true.
The consequences of reporting effects produced by these types of statistical mistakes can be severe. Some of the results presented to the client will be invalid due to the increased error rate. Depending on what the analyses intended to measure, the results could be detrimental: failure of a newly introduced product or package, using inappropriate messaging, selecting the wrong strategy, and/or a loss in revenue and market position.
How can researchers avoid these consequences? Researchers should:
- Adhere to the scientific method. Creating hypotheses before statistical testing is paramount and sticking to the hypotheses is also important.
- Refrain from running massive amounts of single significance tests. This includes correlations, regressions, t-tests, chi-square tests, and ANOVAs, just to mention a few.
- Use factor analysis to reduce the number of variables to be tested. Use structural equation modeling to model all relationships simultaneously.
- If structural equation modeling is not possible, Bonferroni corrections or the Benjamini-Hochberg Procedure can be used. These corrections attempt to recalculate the criteria for significance such that the number of tests to be examined is accounted for.
- Refrain from data dredging.
Zora Neale Hurston stated, “Research is formalized curiosity. It is poking and prying with a purpose.” Likewise, Ken Norris has explained that, “The Scientific Method is nothing more than a system of rules to keep us from lying to each other.”
These quotes help us understand that how we arrive at our results is just as important as, if not more important than, the results themselves. Adhering to the scientific method, and thus, scientific principles, helps prevent researchers from performing many unnecessary statistical tests that increase Type I Error. This means that researchers can avoid giving erroneous advice, which saves them and their clients time and money.
Footnote 1: The calculation is as follows; 1 – (1 - .05)3 = .143.
About the Author
Audrey Guinn, Ph.D. (email@example.com) is a Statistical Analyst in the Advanced Analytics Group at Decision Analyst. She may be reached at 1-800-262-5974 or 1-817-640-6166.
Copyright © 2020 by Decision Analyst, Inc.
This posting may not be copied, published, or used in any way without written permission of Decision Analyst.