The Patriots have landed in Phoenix for yet another Super Bowl, but there are still those who can’t stop talking about “Deflategate.” Yes, that’s what some are calling the controversy surrounding those perfectly legal 12.5 PSI inflated footballs that lost air pressure due to changing atmospheric conditions and repeated Gronking* after touchdowns during the first half of the Pats-Colts showdown.
Here in Boston, we were shocked to turn on the TV and hear the terrible accusations. Were we watching and reading the same things as the accusers? Did those doubters not watch the press conferences (all three of them) where our completely ethical coach proclaimed his team’s innocence? Did they not understand that Belichick even conducted a SCIENCE EXPERIMENT?
Or could it be simply that the doubters live outside of New England?
The chart above makes it pretty obvious—from Bangor to Boston, we just might have been hearing the voices of a lot more Pats fans. This is, in fact, a really simple illustration of the dangers of convenience sampling—a very common type of non-probability sampling.
Sure it’s a silly example, but as companies try to conduct research faster and cheaper, convenience sampling poses serious threats. Can you get 500 completes in a day? Yes, but there’s a very good chance they won’t be representative of the population you’re looking for. Posting a link to your survey on Facebook or Twitter is fast and free, but whose voice will you hear and whose will you miss?
I’ve heard it said that some information is better than none, but I’m not sure I agree. If you sample people that aren’t in your target, they can lead you in the completely wrong direction. If you oversample in a certain population (ahem, New Englanders) you can also suffer from a biased, non-representative sample.
Representative sampling is one of the basic tenets of survey research, but just because it’s a simple concept doesn’t mean we can afford to ignore it. Want your results to win big? Carefully review your game plan before kicking-off data collection.
- Sample Frame: Is the proposed sample frame representative of the target population?
- Unless you are targeting a niche population. . .
- online panel “click-throughs” should be census balanced
- customer lists must be reflective of the target customers (if the population is all customers, do not use email addresses unless addresses exist for all customers or the exceptions are randomly distributed)
- compare the final sample to the target population just to be sure
- Selection: Does the selection process ensure that all potential respondents on the frame have an equal chance of being recruited throughout the data collection period?
- To be sure, you should. . .
- randomize all lists before recruiting
- not fill quotas first
- not focus on hard-to-reach respondents first
- Data collection: Will the proposed data collection plan adversely affect sample quality?
- Ask yourself:
- Are fielding dates unusual (e.g., holiday, tax returns, Super Bowl, etc.)?
- Is the schedule long enough to cover weekdays and weekends? Will it give procrastinators sufficient time to respond?
- Structure: Will important subgroups have sufficient sample sizes if left to fall out naturally?
- If not, set quotas. . .
- Quota groups must be weighted back to their natural distribution before analysis or treated as an oversample and excluded from any analysis at the total level.
- Size: Is the proposed sample size sufficient?
- We must always balance costs against sample size, but, at the same time, we must recognize that we need minimum sample sizes for certain objectives.
Are there times you might need some quick and dirty (un-Patriot like) results? Absolutely. But, when you’re playing for big insights, you need the right team.
*spiking the football after a touchdown.
Athena Rodriguez is a Project Consultant at CMB. She’s a native Floridian, who’s looking forward to the end of the Blizzard of 2015 and the start of Sunday’s game!