Fancy Statistics Do Not Equal Causation by Audrey Guinn, Ph.D.

Fancy Statistics Do Not Equal Causation
by Audrey Guinn, Ph.D.

In research, understanding cause is often the goal. What is causing a product to sell? What is causing a decrease in subscriptions? What is the path to purchase or recommendation? What is the decision-making pathway? Frequently, though, data has been collected using typical surveying methods that will not render answers about causation no matter which robust and fancy statistics are used.

Why can’t causation be determined from survey data alone?

Conducting statistical tests using data from typical survey methods returns correlational results. This is because survey data without an experimental design (a research plan that controls for other potential causes), is correlational data. Correlation shows that two things are related. For positive correlations, as one variable increases, so too does the other variable and as one variable decreases, so too does the other variable. Correlation, however, does not imply causation. Causation occurs when one variable generates the change in another variable.

This website contains many examples that illustrate the idea that correlation does not equal causation. One such example is that margarine consumption is positively correlated with the divorce rate in Maine. Over time, as the consumption of margarine decreased, so too has the divorce rate. This does not mean though, that forgoing margarine will save your marriage. Instead, there is usually another variable at play that is causing both things to happen, which creates the correlation between the two. That is why it is so important to control for other variables using an experimental design to determine cause.

What about other fancy statistical techniques such as regression, structural equation modeling, or serial mediation modeling (domino effect)? Won’t those show causation? These techniques are also correlation based unless there is an experimental design involved.

Regression is similar to correlation. In fact, when only one independent variable is predicting one dependent variable, the standardized coefficient beta in the regression output is the exact same value as the Pearson correlation coefficient in the correlation output.

Structural equation modeling combines regression with factor analysis. Although structural equation modeling is an advanced statistical technique, it is still correlation based unless an experimental design was performed to rule out other possible explanations for the results.

Likewise, serial mediation modeling is an advanced regression technique that examines domino-like effects (where one thing influences another which influences something else which then impacts the outcome). Although this type of modeling is considered causal, unless the data was collected using an experimental design that controls for other possible outcomes, it still produces correlational results.

Collecting survey data and then running massive amounts of statistical tests looking at pathways to and from all combinations of variables will not only not reveal cause and effect (because there is no experimental design or a priori hypothesis) but will also return spurious (false positive) results. This is due to the increased Type I error that happens when conducting many statistical tests. Please see this blog to learn more about spurious results. The only way to truly determine cause and effect is to control for all extraneous variables by utilizing an experimental design. That way, all other possible explanations for the outcome have been ruled out.

How can cause be determined?

Method 1: True Experimental Design

To truly determine that one thing causes another, an experimental design must be implemented. True experimental designs require control over every aspect of the experiment. This is very important as these controls rule out any other possible explanations for the outcome. To design an experiment, first, a hypothesis is formed by reading previous research on the topic and coming up with testable theories. Next, independent variables (the variables that will be manipulated) and dependent variables (the variables that will be measured) are defined. Participants are randomly assigned to conditions. Then the experiment is performed.

One type of experimental design used in marketing research is AB testing. For example, if we wanted to understand what packaging would cause an increase in a beverage’s sales, we may hypothesize that packaging A (the new package design) would sell more than packaging B (the original packaging, or control variable). The independent variable is packaging type A or B. Individuals would be randomly assigned to see either the new packaging A or the original packaging B when they visited the website. Sales per each packaging type is the dependent variable. Price, amount of liquid, flavor, and name all remain the same.

Randomly assigning participants to package A or package B and keeping all other aspects of the beverage the same rules out all other possible explanations for any effect on sales. Therefore, we know with certainty that the packaging type (the independent variable) is causing any increase we see in sales (the dependent variable).

AB testing can be used to answer many questions in marketing research such as: Which webpage is better at increasing traffic? Which product claim results in the greatest sales? Which price results in the most sales? Which button design receives the most clicks? Which webpage is most user-friendly? Which offers result in the greatest subscriptions?

While AB testing is great at assessing a few concepts at a time, it does not perform well when many aspects of a product are included in the design. For example, if a researcher wanted to understand what combination of price, brand, packaging, claims, and offers is best for increasing sales, choice modeling is the gold standard for determining this.

In choice modeling, an experimental design is used to create product profiles utilizing all possible combinations of the attributes (price, brand, packaging, claims, and offers). Respondents are then shown several different product profiles in sets and are asked to select the one from each set they would be most likely to purchase. This is repeated with new and differing product profiles in each set, until all combinations in the model have been seen. The combination of the attributes is the independent variable and which product profile the respondent selects to purchase is the dependent variable. Because the respondents are randomly assigned to see different sets of products and because all other variables are being controlled, we know which specific combination of product attributes yields the greatest demand and revenue. Choice modeling also reveals price sensitivity and price optimization.

These experimental designs can be used together as well. Choice modeling can be used to figure out the optimal product configuration and then AB testing can be used to test the new configuration against the old configuration to determine unequivocally whether the new configuration is driving sales.

In some cases, however, with marketing research and other fields such as health science or even some areas in psychology, a true experiment cannot be performed. This is because assigning people to some conditions can be unethical (e.g., infecting someone with a disease they don’t already have to test a new drug’s efficacy). And there are some conditions participants can’t be assigned to, such as gender identity. In these situations, the next best option is a quasi-experimental design.

Method 2: Quasi-Experimental Design

Quasi-experimental designs have less control than true experimental designs but more control than regular survey methods.

When it’s not possible to assign respondents randomly to conditions, one option is to use a pretest-posttest method. This requires only one group of people who are measured before partaking in the experiment and then again after the experiment. Suppose we are a drink manufacturer who has created a new sports drink that we believe reduces dehydration. To utilize this type of experimental design, we would ask a group of people to perform a specific one-hour workout at a certain time in a particular setting. We would then measure their level of dehydration (pretest). After a period of recovery, we would have the same group perform the same workout in the same setting. Immediately after the second workout, they would be instructed to consume one bottle of the new sports drink. Then we would measure their dehydration level again (posttest). In this pretest-posttest design, consuming the drink (or not) is the independent variable and dehydration level is the dependent variable. Comparing pretest dehydration level to posttest dehydration level, we can ascertain whether the new sports drink reduced dehydration.

Another quasi-experimental design option is to match two groups of respondents on things like gender, age, ethnicity, income, and other variables that could influence the dependent variable. Then one group will partake in the experiment and one group will serve as the control group (will go on about life as usual, will receive the original product, or will receive a placebo). For example, if we wanted to know if our newly created sports drink hydrated folks better than our original sports drink, we would use the matched-pairs design. Individuals from each group would be matched based on things like weight, height, ethnicity, gender, and general health. One group would be assigned to drink the new sports drink after workouts (the experiment group) while the other group would continue to drink the original sports drink after workouts (the control group). The sports drink type is the independent variable, and the measured rehydration is the dependent variable. The other variables (weight, height, ethnicity, etc.,) are being controlled via the matched-pairs design. At the end of the experiment, if those who drank the new sports drink were rehydrated more than those who drank the original sports drink, we could say that the new sports drink caused better rehydration.

Another option is to measure the same set of respondents over time using a longitudinal study. This type of study can be utilized to measure things like customer satisfaction, brand awareness, brand loyalty, product health, etc. Like pretest-posttest, we can measure customer satisfaction before rolling out a new promotion and then again after the rollout. This would tell us whether the promotion is causing increased satisfaction. Additionally, we could continue to measure satisfaction at regular intervals over several months or even years. This would allow us to examine what other factors contribute to increased or decreased satisfaction.

Conclusion

Fancy statistics do not determine cause and effect without rigorous control using an experimental design. This is because data from typical surveying methods is correlational data and all analyses performed on that data render correlational output. Thankfully, many types of experimental and quasi-experimental designs exist to determine cause and effect.

About the Author

Audrey Guinn, Ph.D. (aguinn@decisionanalyst.com) is a Statistical Consultant in the Advanced Analytics Group at Decision Analyst. She may be reached at 1-800-262-5974 or 1-817-640-6166.

Author

Audrey Guinn, Ph.D.
Statistical Consultant
aguinn@decisionanalyst.com
LinkedIn Profile

Resources

List of All Blogs
More Blogs by Audrey Guinn
3 Avoidable Statistical Mistakes by Audrey Guinn
Risky Business, Statistically Speaking by Elizabeth Horn, Ph.D.
Universe Error by Jerry W. Thomas