Reliability and Validity from a Scientific Perspective by Elizabeth Horn, Ph.D.

Reliability and Validity from a Scientific Perspective

When discussing research results with a broad audience, we often default to using conversational language.

There’s nothing inherently wrong with this approach. It is important that the stakeholders understand what the research found and what the findings mean for the business. Yet, language is imprecise. Different words can be synonymous in a casual conversation, but take on different meanings in a scientific context.

Both “reliable” and “valid,” for instance, are used to mean “robust” or “accurate” in everyday speech. Perhaps you have heard one stakeholder refer to research results as “reliable” and yet another describes the same results as “valid.” The stakeholders likely meant that the findings were both: (1) based on a well-thought-out survey with a sample size large enough to detect any statistical differences and (2) believable to management. The concepts of reliability and validity are not interchangeable from a scientific perspective, however. These two words are not identical, and understanding the difference is important when interpreting research outcomes.

Reliability

Reliability is a measure of the consistency of the results: Are the research results replicable? For example, many workplaces are conducting temperature screenings of their employees. It is important that thermometers consistently measure temperature. For the same individual, thermometers should produce similar results when the readings are taken a few seconds apart. Likewise, research studies should produce consistent results. This stability is assessed via two main types of reliability: internal and external.

Internal reliability refers to the consistency of survey results (or a qualitative interview) within itself. If respondents report that they buy large quantities of motor oil in one part of the interview, then they should also report later that they are purchasers for a trucking company. Healthy snack attitudes that are peppered throughout a survey should be statistically and positively correlated with one another. A quantitative survey or qualitative interview that is not internally consistent is not useful.

External reliability refers to the replicability of results over time. Imagine that a personality assessment is being used as part of an employment screening process. The assessment would not be a useful hiring tool if it delivered a different personality profile for the same candidate six months later. For most market research studies establishing external reliability is not feasible due to time and budget constraints. Reliance on internal reliability is the norm.

Validity

It isn’t sufficient that research results are reliable (but it is certainly a great start!). The next step is to interpret findings to determine their validity. Validity is a measure of accuracy: Are the research results true? There are many types of validity. The main ones are construct, internal, and external. A bonus validity type is included as well.

Construct validity is all about using the right methods to measure the desired outcome. Is the research measuring what it is supposed to measure? For instance, if a company wants to identify consumers who are likely to buy a new product, the survey would include a likelihood-to-buy question. It seems like the correct metric, but is it? Some respondents might not want to tell you the product stinks, so they select the neutral rating on the scale, rather than the lowest rating possible. Others might give very high ratings to express enthusiasm for being asked their opinions. Because of these known scale-usage biases, researchers have included other measures—such as uniqueness, frequency of purchase, and believability of the product concept—to triangulate on the construct of “purchase likelihood.”

Internal validity is about establishing cause and effect in the research. To say with confidence that X causes Y, all other potential causes must be ruled out. This is the essence of the scientific method. In survey research, this can be difficult, as respondents are not in a laboratory setting under strict experimental conditions. Instead, market researchers do their best to control extraneous variables, such as ensuring that respondent demographics and even psychographics are comparable across treatment cells, and that stimuli are standardized and presented in a uniform manner.

External validity refers to the generalizability of the research outcomes. As an example, if a new product performs well among a sample of consumers, the new product should perform well in the market. Internal validity is a necessary first step (controlling for other explanations of X causes Y) in the effort to demonstrate external validity. Other steps, such as ensuring that the sample is representative of the population and testing stimuli in a realistic context (with competitive products, offers, or messaging), all bolster the applicability of the findings to the real world.

A bonus validity type is face validity. Results that correspond with stakeholders’ expectations are considered quite plausible. ‘Of course,’ they say, ‘that makes perfect sense.’ Face validity can throw a wrench in developing new insights, though. Results that seem counterintuitive can be dismissed on the basis of being illogical or nonsensical. Assuming that the research was conducted using the highest standards, it is up to the researcher to advocate for the findings by illustrating how the research met other validity criteria.

Extracting consistent truth from quantitative and qualitative research can be challenging. Knowing the difference between consistency (reliability) and truth (validity), and how they work together, are the keys to helping stakeholders and management meaningfully interpret research outcomes.

Author

Elizabeth Horn

Senior VP, Advanced Analytics

Beth has provided expertise and high-end analytics for Decision Analyst for over 25 years. She is responsible for design, analyses, and insights derived from discrete choice models; MaxDiff analysis; volumetric forecasting; predictive modeling; GIS analysis; and market segmentation. She regularly consults with clients regarding best practices in research methodology. Beth earned a Ph.D. and a Master of Science in Experimental Psychology with emphasis on psychological principles, research methods, and statistics from Texas Christian University in Fort Worth, TX.

Statistics Blog