To Weight, or Not to Weight
(A Primer on Survey Data Weighting)
by Jerry W. Thomas
It often happens that a perfectly designed sampling plan ends up with too many women and not enough men completing the survey, or too many old people and not enough young people. In these cases, data weighting might make sense, if you want totals that accurately reflect the whole population. The term "data weighting" in most survey-related instances refers to respondent weighting (which in turn weights the data or weights the answers). That is, instead of a respondent counting as one (1) in the cross-tabulations, that respondent might count as 1.25 respondents, or .75 respondents. Here are some best practices to keep in mind when you are thinking about weighting survey data.
- If possible, always perfectly balance the sample during the sampling and screening process so that you never have to weight any data. This is almost always the best and most defensible solution.
- If you do decide to weight survey data, remember there is a price to pay. Nothing in life is free. The cost of weighting data is reduced accuracy. The sampling variance, standard deviation, and standard error increase.
- Remember that the cost of weighting data is greater (in terms of reduced accuracy) when the sample size is smaller. If you have thousands of respondents, you can weight the data as much as you please and the cost in reduced accuracy is very small. On the other hand, if you have fewer than 100 respondents, the cost in reduced accuracy might be very great. Be especially cautious in weighting data when samples sizes are small.
- In deciding whether and how to weight survey data, it’s a good idea to review the cross-tabs to see which demographic (or other) variables appear to have the greatest impact on the answers. For example, if men and women give very similar answers, weighting the sample by gender will have little effect on the percentages in your tabulations. On the other hand, if different age groups are giving different answers, then weighting by age will change the numbers in your tabulations.
- When data must be weighted, weight by as few variables as possible. As the number of weighting variables goes up, the greater the risk that the weighting of one variable will confuse or interact with the weighting of another variable.
- When data must be weighted, try to minimize the sizes of the weights. A general rule of thumb is never to weight a respondent less than .5 (a 50% weighting) nor more than 2.0 (a 200% weighting).
- Keep in mind that up-weighting data (weight › 1.0) is typically more dangerous than down-weighting data (weight ‹ 1.0). In up-weighting, you have too few respondents and are pretending that those respondents each count for more than one person; and the greater the up-weight, the more those respondents' answers are exaggerated.
- A best practice is to create two sets of cross-tabulations: one set weighted and one set unweighted. Look at these two sets of cross-tabulations side by side, to make sure all the numbers look reasonable.
Most widely used tabulations systems and statistical packages use Iterative Proportional Fitting (or something similar) to weight survey data, a method popularized by the statistician Deming about 75 years ago.
If you weight your survey data and the results are not what you hoped for, do not despair. There are hundreds of different weighting schemes and algorithms, and each has its own hidden assumptions and biases. So if you don't get the results you want with one weighting scheme, remember there are hundreds of other ways to weight the data: one of those might give you the answers your boss seeks. Of course, this is the reason for the first bullet point in this article.
About the Author
Jerry W. Thomas (firstname.lastname@example.org) is President/CEO of Decision Analyst. He may be reached at 1-800-262-5974 or 1-817-640-6166.
Copyright © 2017 by Decision Analyst, Inc.
This posting may not be copied, published, or used in any way without written permission of Decision Analyst.