Friday, June 15, 2018

When applying weights to your data, don’t pull a muscle

Imagine you are designing a study for a client who wants to have “readable” base sizes of certain key demographic groups represented in a survey, e.g., race and ethnicity.  So, in order to accommodate, you set up the sample configuration such that Caucasians, African-Americans, Hispanics, and Asians each have a base size of 100 completes, such that your total sample size is n=400.

So far, so good.  But then, your client wants you to test the significance of differences of each group against the total sample.  Well, everything would be okay if each of these groups were equal in size in the population.  Of course, they are not, so that means you can’t simply roll up the 400 respondents into one group and make straightforward comparisons to the separate groups.  To solve for this issue, you decide to weight the data using population proportions of each group according to the latest census data available.  

In essence, weighting data is like pulling taffy.  For some groups, you only need to pull the taffy a little bit because their proportion in the sample is close to the population.  In other groups, you will need to stretch the taffy more as they may be under-represented in the sample, relative to the population.

However, all kinds of trouble can occur at this stage of your otherwise well-designed study.  You can apply weights to a data set that range way too large and way too small.  You can apply the weight by assigning a proportion of one of the subgroups incorrectly.  And you can apply the weight correctly and forget to read your crosstabs that show “Weighted Data.”  When using weights, be warned that trouble is lurking around the corner if you are not careful and check your work before publishing the results to your client.

To begin, examine each individual weight being applied to each respondent’s data.  If the weight being applied is greater than 2.0, you may be trying to pull that taffy too far, and it may snap.  If the weight is close to 0.0, you are essentially eliminating that respondent’s data since anything multiplied by zero is zero.  If you can stay within the range of 0.5 and 1.5, you are in good shape, and the taffy will be just right.

Whomever is handling your data processing, whether it is some crack technician that’s been running Quantum to produce crosstabs for years and years, or whether you are doing it yourself, double check your work.  Believe us, these errors are made because they can be easily overlooked.

The worst error to make is by posting unweighted data to your report.  Again, easy to do, but extremely costly to overcome.  Your client will be hard pressed to process your invoice, and will probably never call you again for another study in the future.  Check and double check your work.  Better yet, have someone else check your work as most researchers I know can tell a story about having looked at something for so long, can not see errors they’ve made that are right under their nose.

Weighting data is surely the Achilles’ heel of market research.  So, when you find yourself in a study in which applying weights is necessary, please be careful, stretch first, and don’t pull a muscle.