Improving Customer Satisfaction and Loyalty with Time-Series Cross-Sectional Models
by John Colias, Ph.D., Beth Horn, Ph.D., and Ellen Wilkshire
Customer satisfaction and loyalty surveys typically track brand perceptions both overall and with respect to specific performance areas. For example, a survey might ask customers to rate brands based on overall satisfaction, likelihood to purchase again, likelihood to recommend, customer service, product performance, and brand image.
Since brand ratings are usually tracked over time, two types of data variability are available for analysis:
- Across units (e.g., stores) within a time period
- Across time periods within a unit
In our experience, most regression modeling approaches in customer satisfaction and loyalty research use only data variation across stores, providers, or other units. That is, most lack the time-series component or variation in unit performance over time.
Predictive accuracy could be strengthened if variability due to ratings of units over time (e.g., ratings of a store from one month to the next) were included in the modeling. Since customer satisfaction research typically collects data over time, including the time-series component in models is a natural extension to current practice.
In the article, we report an application of time-series cross-sectional (TSCS) modeling, which incorporates both across-units and across-time variation in data variables. The results from this application illustrate the value of adding the time-series component to the analysis.
Before presenting our application, we present the most common statistical modeling approaches. Then, we report our application with actual data. Finally, we discuss the implications of our findings for marketing research.
Common Statistical Modeling Approaches
Four modeling techniques can be used in modeling customer satisfaction or loyalty–structural equation modeling, limited dependent variable regression, latent class regression, and Hierarchical Bayes (HB) regression. The first three techniques increase reliability and accuracy, but do not necessarily use the information contained in across-time variation.
Structural equation modeling is a statistical tool that describes complex relationships by combining multivariate regression and confirmatory factor analysis. This technique delivers more reliable measurement by using multiple measures of each factor. It delivers more accurate predictions when brand attribute ratings that relate to product and service are highly collinear.
Regression with a limited dependent variable determines the probability of each scale point. By modeling probability mass at each scale point, we avoid the assumption of a normal distribution of the dependent variable. In fact, scale ratings do not typically have a normal distribution, being usually skewed to the high-end of the rating scale. By using a more flexible distribution assumption as the basis for the model, greater prediction accuracy can be achieved.
Latent-class and Hierarchical Bayes regression can also model a limited dependent variable, but takes an even further step towards realism. Latent-class regression assumes the possibility of separate regression parameters for latent, or unobserved, segments. Hierarchical Bayes assumes the possibility of separate regression parameters for each individual customer.
Time-series cross-sectional (TSCS) modeling can use any of the aforementioned techniques. For example, our application of time-series cross-sectional modeling that we report in this paper uses a latent-class regression model.
What distinguishes TSCS is that it uses more than one time period to develop the model parameters. By incorporating multiple time periods of data, TSCS delivers more accurate predictions.
Application of Time-Series Cross-Sectional Vs. Cross-Sectional Only Modeling
The data used in this current research were collected via the Internet. Four time periods of customer satisfaction data for six brands were obtained. The respondents were members of American Consumer Opinion® Online, an online panel of consumers that have agreed to participate in Internet surveys. For this study, all panel members were based in a large, U.S. metropolitan area.
Respondents rated brands on 21 attributes. The attributes addressed product quality, product appeal, customer service, and overall satisfaction. Ratings were based on 10-point scales that ranged from 1 (Very Poor) to 10 (Excellent), with an option of “Don’t Know.” Respondents only rated brands with which they were familiar. For confidentiality reasons, we cannot reveal the category, the attributes, or the brand names.
Data were aggregated by zip code into separate market areas for each brand. The original intent of the survey data was to answer questions about specific store brands’ image and performance within the category. Thus data were not collected by store location, just by the store brand in general.
We initially wanted to define a cross-section as a specific unit location (such as store #24 in the New York area). However, sample size was limited at the store level. So, to form an approximation for store locations, the data were aggregated by brand within zip code. The 102 zip codes collected in the four studies were aggregated into 17 groups based on a geomapping technique. The six brands had adequate representation in each of the 17 zip code groups. Respondents who lived within the same cluster of zip codes were assumed to be rating stores within the same market area.
Overall satisfaction varied substantially by store and zip code area. This variability is depicted in the following histogram where, out of 296 observations (an observation is a set of stores within a zip code cluster), a significant portion have an average rating less than 6.0 or greater than 8.0.
Even more important, however, is the variation of brand ratings across time, as this across-time variation is what we would like to use to better understand the drivers of customer satisfaction. The next chart shows that Brands 1 and 6 showed significant improvement over time in overall satisfaction. The TSCS model attempts to explain which store attributes caused this improvement over time.
Two models were developed:
- A cross-sectional model (from Time Period 3 only), which consisted of seven attribute factors as predictors of overall satisfaction by store.
- A time-series cross-sectional model, which consisted of the seven factors as predictors of overall satisfaction by store from Time Periods 1, 2, and 3.
Latent-class regression was used to develop the two models in order to avoid the erroneous assumption that one single regression model describes all members of a population. Latent-class regression, in contrast, yields regression coefficients that are different across unobserved (latent) groups.
Each model was used to predict the change in overall satisfaction between Time Periods 3 and 4. The CSO (cross-sectional only) model predicted Time Period 4 with an MAE (mean absolute error of predicted vs. actual) of .22 rating points across the six brands.
The TSCS (time-series cross-sectional) model predicted more accurately with an MAE of 0.11.
The evidence from this data suggests that predictive accuracy in customer satisfaction and loyalty research is enhanced when time-series variation is used. We believe that we might have achieved a greater increase in predictive accuracy for TSCS vs. CSO if (a) we had been able to use exact store locations instead of zip code clusters as the cross-sections within the analysis and (b) more time periods had been available to develop the model.
While the improvement in prediction accuracy is nice to have, a more important finding is that the TSCS model tells a different story about what factors drive overall satisfaction.
As indicated in the bar chart, both the CSO model (using only cross-sectional variation) and the TSCS model (using both time-series and cross-sectional variation) suggests that Factor 7 is the strongest driver of overall satisfaction. However, the CSO model points to Factor 3 as the second strongest driver, while the TSCS model points to Factor 1.
Given that the two approaches delivered different answers about what drives overall satisfaction, we are left with a choice of which approach to choose when applying the model to business decision making. We believe that the stronger methodological approach should be chosen. The time-series cross-sectional model uses more information (across-time variation) to score and rank the drivers of overall satisfaction. For this reason, we believe that the time-series cross-sectional model provides the better result.
Because of multicollinearity of the attribute ratings that potentially predict overall satisfaction, 20 attributes were reduced to seven factors via factor analysis.
Implications for Marketing Research
If your business currently tracks brand ratings on overall satisfaction or customer loyalty over time, we recommend that you consider developing time-series cross-sectional models to explain customer satisfaction or customer loyalty.
If your business desires to set priorities for brand, product, and customer service improvements, time-series cross-sectional models provide an answer that (a) better predicts future market outcomes and (b) delivers a better assessment of the true drivers of customer satisfaction or customer loyalty.
About the Authors
John Colias (firstname.lastname@example.org) is Senior Vice President at Decision Analyst. Beth Horn (email@example.com) is Vice President at Decision Analyst. Ellen Wilkshire (firstname.lastname@example.org) is Senior Statistical Analyst at Decision Analyst. They may be reached at 1-800-262-5974 or 1-817-640-6166.
Copyright © 2007 by Decision Analyst, Inc.
This article may not be copied, published, or used in any way without written permission of Decision Analyst.