Welcome to the CMB Blog!  The posts here represent the opinions of CMB employees and guests, not necessarily the company as a whole.

Follow us on Facebook, LinkedIn, and Twitter!

Subscribe by Email

Your email:

CMB Consumer Pulse

ConsumerPulseLogo new resized 244

Download the latest market research reports from our Consumer Pulse series.

Named a Top NGMR Blog

An NGMR Top Blog

All posts by category

Voices of CMB: The Chadwick Martin Bailey Research Blog

Current Articles | RSS Feed RSS Feed

Sig Testing Social Media Data is a Slippery Slope


Social media listeningDuring a recent social media webinar, the question was raised “How do we convince clients that social media is statistically significant?”  After an involuntary groan, this question brought two things to mind:

  • There are a lot of people working in social media research who do not understand the fundamentals of market research; and

  • Why would anyone want to apply significance testing to social media data?

Apparently, there’s much debate in online research forums about whether significance testing should be applied to social media data.  Proponents argue that online panels are convenience samples and significance testing is routinely applied to those research results – so why not social media?  Admittedly that is true, but the ability to define the sample population and a structured data set should provide some test/retest reliability of the results.  It’s not a fair comparison.

I’m all for creative analysis and see potential value in sig testing applied to any data set as a way to wade through a lot of numbers to find meaningful patterns.  The analyst should understand that more things appear to be significant with big data sets so it might not be a useful exercise for social media.  Even if it can be applied, I would use it as a behind-the-scenes tool and not something to report on.

Anyone who has worked with social media data understands the challenging, ongoing process of disambiguation (removing irrelevant chatter). There are numerous uncontrollable external factors including the ever-changing set of sites the chatter is being pulled from.  Some are new sites where chatter is occurring but others are new sites being added to the listening tool’s database.   Given the nature of social media data, how can statistical comparisons over time be valid?  Social media analysis is a messy business.  Think of it as a valuable source of qualitative information.

There is value in tracking social media chatter over time to monitor for potential red flags.  Keep in mind that there is lot of noise in social media data and more often than not, an increase in chatter may not require any action. 

Applying sig testing to social media data is a slippery slope.  It implies a precision that is not there and puts focus on “significant” changes instead of meaning.  Social media analysis is already challenging – why needlessly complicate things?

Cathy is CMB’s social media research maven dedicated to an “eyes wide open” approach to social media research and its practical application and integration with other data sources. Follow her on Twitter at @VirtualMR 






Cathy, nice post. Good insights.
Posted @ Wednesday, February 29, 2012 9:20 AM by scott burkhead
Social Media analysis is indeed a massive undertaking.  
I believe the MIT PHD's are better suited to evaluate the general sentiment from social media than traditional market researchers. 
Either way, why not view social media as a enormous pool of people that obviously care about certain brands and topics, and in turn, entice those people to share their opinions about the questions that you as a market researcher are looking to answer?
Posted @ Wednesday, February 29, 2012 1:15 PM by Adam Wexler
This is a thoughtful and provocative position. 
I have personally observed that many managers quickly forget all the caveats about tests of significance and the assumptions behind such tests that people might typically use. 
Every good stat professor, even in intro courses, spends a lot of time explaining the many caveats, but as students enter the real world they often forget them. 
That's when the 'slippery slope' begins.  
On the other hand, statistical analysis and modeling can be used heuristic ally, to provoke new ways of looking at field data, etc., but don't cite 'significsnce' in reports to clients or management. Just share the provocative insights and see if you can get buy-in to do a 'real' experiment, or at least , a better structured field observation. 
But, if someone is really tempted to do statistics around behavioral/observational data sets, at least brush up on 'non-parametric' statistics (which may be more appropriate, though not frequently taught in intro stat course for undergrads or MBA candidates). 
Just a few thoughts to add to a provocative post.
Posted @ Wednesday, February 29, 2012 1:24 PM by Andy Maddocks
Post Comment
Website (optional)

Allowed tags: <a> link, <b> bold, <i> italics