The CMB Research Blog | Advanced Analytics

The CMB Blog 2015: 6 of Our Favorites

Wed, Dec 30, 2015

We run this blog a little differently than other corporate blogs. Instead of relying on a few resident bloggers, each of our employees writes at least one post a year. This means you get a variety of perspectives, experiences, and opinions on all aspects of market research, analytics, and strategy consulting from insights professionals doing some pretty cool work.

Before we blast into 2016, we wanted to reflect on our blog this past year by taking a second look at some of our favorite posts:

This year, we launched a market research advice column—Dear Dr. Jay. Each month, our VP of Advanced Analytics, Jay Weiner, answers reader-submitted questions on everything from Predictive Analytics to Connected Cows. In the post that started it all, Dr. Jay discusses one of the hottest topics in consumer insights: mining big data.
Research design and techniques are two of our favorite blog topics. A member of our Advanced Analytics team, Liz White, wrote a great piece this year about conjoint analysis. In her post, she shares the 3 most common pitfalls of using this technique and ways to get around them. Read it here.
In June we launched EMPACT^SM— our emotional impact analysis tool. In our introductory blog post to this new tool, CMB’s Erica Carranza discuss the best way to understand how your brand our product makes consumers feel and the role those feelings play in shaping consumers’ choices. Bonus: Superman makes a cameo. Check it out.
Isn’t it great when you can take a topic like loyalty and apply it to your favorite television show? Heidi Hitchen did just that in her blog post this year. She broke down the 7 types of loyalty archetypes by applying each archetype to a character from popular book series A Song of Ice and Fire and hit HBO TV series Game of Thrones. Who’s a “True Loyal”? A “Captive Loyal”? Read to find out!
Our Researcher in Residence series is one of our favorite blog features. A few times a year, we sit down with a client to talk about their work and the ideas about customer insights. Earlier this year, our own Judy Melanson sat down with Avis Budget Group’s Eric Smuda to talk about the customer experience, working with suppliers, and consumer insights. Check it out.
We released a Consumer Pulse report earlier this year on mobile wallet use in the U.S. To deepen our insights, we analyzed unlinked passive mobile behavioral data alongside survey-based data. In this post, our VP of Technology and Telecom, Chris Neal, and Jay Weiner, teamed up to share some of the typical challenges you may face when working with passive mobile behavioral data, and some best practices for dealing with those challenges. Read it here.

What do you want us to cover in 2016? Tell us in the comments, and we look forward to talking with you next year!

Kirsten Clark is CMB’s Marketing Coordinator. She’ll be ringing in the New Year by winning her family’s annual game of Pictionary.

Topics: Strategic Consulting, Advanced Analytics, Consumer Insights

Dear Dr. Jay: Can One Metric Rule Them All?

Posted by Dr. Jay Weiner

Wed, Dec 16, 2015

Hi Dr. Jay –

The city of Boston is trying develop one key measure to help officials track and report how well the city is doing. We’d like to do that in house. How would we go about it?

-Olivia

Hi Olivia,

This is the perfect tie in for big data and the key performance index (KPI). Senior management doesn’t really have time to pour through tables of numbers to see how things are going. What they want is a nice barometer that can be used to summarize overall performance. So, how might one take data from each business unit and aggregate them into a composite score?

We begin the process by understanding all the measures we have. Once we have assembled all of the potential inputs to our key measure, we need to develop a weighting system to aggregate them into one measure. This is often the challenge when working with internal data. We need some key business metric to use as the dependent variable, and these data are often missing in the database.

For example, I might have sales by product by customer and maybe even total revenue. Companies often assume that the top revenue clients are the bread and butter for the company. But what if your number one account uses way more corporate resources than any other account? If you’re one of the lucky service companies, you probably charge hours to specific accounts and can easily determine the total cost of servicing each client. If you sell a tangible product, that may be more challenging. Instead of sales by product or total revenue, your business decision metric should be the total cost of doing business with the client or the net profit for each client. It’s unlikely that you capture this data, so let’s figure out how to compute it. Gross profit is easy (net sales – cost of goods sold), but what about other costs like sales calls, customer service calls, and product returns? Look at other internal databases and pull information on how many times your sales reps visited in person or called over the phone, and get an average cost for each of these activities. Then, you can subtract those costs from the gross profit number. Okay, that was an easy one.

Let’s look at the city of Boston case for a little more challenging exercise. What types of information is the city using? According to the article you referenced, the city hopes to “corral their data on issues like crime, housing for veterans and Wi-Fi availability and turn them into a single numerical score intended to reflect the city’s overall performance.” So, how do you do that? Let’s consider that some of these things have both income and expense implications. For example, as crime rates go up, the attractiveness of the city drops and it loses residents (income and property tax revenues drop). Adding to the lost revenue, the city has the added cost of providing public safety services. If you add up the net gains/losses from each measure, you would have a possible weighting matrix to aggregate all of the measures into a single score. This allows the mayor to quickly assess changes in how well the city is doing on an ongoing basis. The weights can be used by the resource planners to assess where future investments will offer the greatest pay back.

Dr. Jay is fascinated by all things data. Your data, our data, he doesn’t care what the source. The more data, the happier he is.

Topics: Advanced Analytics, Boston, Big Data, Dear Dr. Jay

You Cheated—Can Love Restore Trust?

Posted by James Kelley

Mon, Nov 02, 2015

This year has been rife with corporate scandals. For example, FIFA’s corruption case and Volkswagen’s emissions cheating admission may have irreparably damaged public trust for these organizations. These are just two of the major corporations caught this year, and if history tells us anything, we’re likely to see at least another giant fall in 2015.

What can managers learn about their brands from watching the aftermath of corporate scandal? Let’s start with the importance of trust—something we can all revisit. We take it for granted when our companies or brands are in good standing, but when trust falters, it recovers slowly and impacts all parts of the organization. To prove the latter point, we used data from our recent self-funded Consumer Pulse research to understand the relationship between Likelihood to Recommend (LTR), a Key Performance Indicator, and Trustworthiness amongst a host of other brand attributes.

Before we dive into the models, let’s talk a little bit about the data. We leveraged data we collected some months ago—not at the height of any corporate scandal. In a perfect world, we would have pre-scandal and post-scandal observations of trust to understand any erosion due to awareness of the deception. This data also doesn’t measure the auto industry or professional sports. It focuses on brands in the hotel, e-commerce, wireless, airline, and credit card industries. Given the breadth of the industries, the data should provide a good look at how trust impacts LTR across different types of organizations. Finally, we used Bayes Net (which we’ve blogged about quite a bit recently) to factor and map the relationships between LTR and brand attributes. After factoring, we used TreeNet to get a more direct measure of explanatory power for each of the factors.

First, let’s take a look at the TreeNet results. Overall, our 31 brand attributes explain about 71% of the variance in LTR—not too shabby. Below are each factors’ individual contribution to the model (summing to 71%). Each factor is labeled by the top loading attribute, although they are each comprised of 3-5 such variables. For a complete list of which attributes goes with which factor, see the Bayes Net map below. That said, this list (labeled by the top attributes) should give you an idea of what’s directly driving LTR:

tree net, cmb, advanced analytics

Looking at these factor scores in isolation, they make inherent sense—love for a brand (which factors with “I am proud to use” and “I recommend, like, or share with friends”) is the top driver of LTR. In fact, this factor is responsible for a third of the variance we can explain. Other factors, including those with trust and “I am proud to wear/display the logo of Brand X” have more modest (and not all that dissimilar) explanatory power.

You might be wondering: if Trustworthiness doesn’t register at the top of the list for TreeNet, then why is it so important? This is where Bayes Nets come in to play. TreeNet, like regression, looks to measure the direct relationships between independent and dependent variables, holding everything else constant. Bayes Nets, in contrast, looks for the relationships between all the attributes and helps map direct as well as indirect relationships.

Below is the Bayes Net map for this same data (and you can click on the map to see a larger image). You need three important pieces of information to interpret this data:

The size of the nodes (circles/orbs) represents how important a factor is to the model. The bigger the circle, the more important the factor.
Similarly, the thicker the lines, the stronger a relationship is between two factors/variables. The boldest lines have the strongest relationships.
Finally, we can’t talk about causality, but rather correlations. This means we can’t say Trustworthiness causes LTR to move in a certain direction, but rather that they’re related. And, as anyone who has sat through an introduction to statistics course knows, correlation does not equal causation.

Here, Factor 7 (“I love Brand X”) is no longer a hands-down winner in terms of explanatory power. Instead, you’ll see that Factors 3, 5, 7 and 9 each wield a great deal of influence in this map in pretty similar quantities. Factor 7, which was responsible for over a third of the explanatory power before, is well-connected in this map. Not surprising—you don’t just love a brand out of nowhere. You love a brand because they value you (Factor 5), they’re innovative (Factor 9), they’re trustworthy (Factor 3), etc. Factor 7’s explanatory power in the TreeNet model was inflated because many attributes interact to produce the feeling of love or pride around a brand.

Similarly, Factor 3 (Trustworthiness) was deflated. The TreeNet model picked up the direct relationship between Trustworthiness and LTR, but it didn’t measure its cumulative impact (a combination of direct and indirect relationships). Note how well-connected Factor 3 is. It’s strongly related (one of the strongest relationships in the map) to Factor 5, which includes “Brand X makes me feel valued,” “Brand X appreciates my business,” and “Brand X provides excellent customer service.” This means these two variables are fairly inseparable. You can’t be trustworthy/have a good reputation without the essentials like excellent customer service and making customers feel valued. Although to a lesser degree, Trustworthiness is also related to love. Business is like dating—you can’t love someone if you don’t trust them first.

The data shows that sometimes relationships aren’t as cut and dry as they appear in classic multivariate techniques. Some things that look important are inflated, while other relationships are masked by indirect pathways. The data also shows that trust can influence a host of other brand attributes and may even be a prerequisite for some.

So what does this mean for Volkswagen? Clearly, trust is damaged and will need to be repaired. True to crisis management 101, VW has jettisoned a CEO and will likely make amends to those owners who have been hurt by their indiscretions. But how long will VW feel the damage done by this scandal? For existing customers, the road might be easier. One of us, James, is a current VW owner, and he is smitten with the brand. His particular model (GTI) wasn’t impacted, and while the cheating may damage the value of his car, he’s not selling it anytime soon. For prospects, love has yet to develop and a lack of trust may eliminate the brand from their consideration set.

The takeaway for brands? Don’t take trust for granted. It’s great while you’re in good favor, but trust’s reach is long, varied, and has the potential to impact all of your KPIs. Take a look at your company through the lens of trust. How can you improve? Take steps to better your customer service and to make customers feel valued. It may pay dividends in improving trust, other KPIs, and, ultimately, love.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. He keeps buying new cars to try to make the noise on the right side go away.

James Kelley splits his time at CMB as a Project Manager for the Technology/eCommerce team and as a member of the analytics team. He is a self-described data nerd, political junkie, and board game geek. Outside of work, James works on his dissertation in political science which he hopes to complete in 2016.

Topics: Advanced Analytics, Data Collection, Dear Dr. Jay, Data Integration, Customer Experience & Loyalty

Dear Dr. Jay: Data Integration

Posted by Jay Weiner, PhD

Wed, Aug 26, 2015

Dear Dr. Jay,

How can I explain the value of data integration to my CMO and other non-research folks?

- Jeff B.

Hi Jeff,

Years ago, at a former employer that will remain unnamed, we used to entertain ourselves by playing Buzzword Bingo in meetings. We’d create Bingo cards with 30 or so words that management like to use (“actionable,” for instance). You’d be surprised how fast you could fill a card. If you have attended a conference in the past few years, you know we as market researchers have plenty of new words to play with. Think: big data, integrated data, passive data collection, etc. What do all these new buzzwords really mean to the research community? It boils down to this: we potentially have more data to analyze, and the data might come from multiple sources.

If you only collect primary survey data, then you typically only worry about sample reliability, measurement error, construct validity, and non-response bias. However, with multiple sources of data, we need to worry about all of that plus level of aggregation, impact of missing data, and the accuracy of the data. When we typically get a database of information to append to survey data, we often don’t question the contents of that file. . . but maybe we should.

A client recently sent me a file with more than 100,000 records (ding ding, “big data”). Included in the file were survey data from a number of ad hoc studies conducted over the past two years as well as customer behavioral data (ding ding, “passive data”). And, it was all in one file (ding ding, “integrated data”). BINGO!

I was excited to get this file for a couple of reasons. One, I love to play with really big data sets, and two, I was able to start playing right away. Most of the time, clients send me a bunch of files, and I have to do the integration/merging myself. Because this file was already integrated, I didn’t need to worry about having unique and matching record identifiers in each file.

Why would a client have already integrated these data? Well, if you can add variables to your database and append attitudinal measures, you can improve the value of the modeling you can do. For example, let’s say that I have a Dunkin’ Donuts (DD) rewards card, and every weekday, I stop by a DD close to my office and pick up a large coffee and an apple fritter. I’ve been doing this for quite some time, so the database modelers feel fairly confident that they can compute my lifetime value from this pattern of transactions. However, if the coffee was cold, the fritter was stale, and the server was rude during my most recent transaction, I might decide that McDonald’s coffee is a suitable substitute and stop visiting my local DD store in favor of McDonald’s. How many days without a transaction will it take the DD algorithm to decide that my lifetime value is now $0.00? If we had the ability to append customer experience survey data to the transaction database, maybe the model could be improved to more quickly adapt. Maybe even after 5 days without a purchase, it might send a coupon in an attempt to lure me back, but I digress.

Earlier, I suggested that maybe we should question the contents of the database. When the client sent me the file of 100,000 records, I’m pretty sure that was most (if not all) of the records that had both survey and behavioral measures. Considering the client has millions of account holders, that’s actually a sparse amount of data. Here’s another thing to consider: how well do the two data sources line up in time? Even if 100% of my customer records included overall satisfaction with my company, these data may not be as useful as you might think. For example, overall satisfaction in 2010 and behavior in 2015 may not produce a good model. What if some of the behavioral measures were missing values? If a customer recently signed up for an account, then his/her 90-day behavioral data elements won’t get populated for some time. This means that I would need to either remove these respondents from my file or build unique models for new customers.

The good news is that there is almost always some value to be gained in doing these sorts of analysis. As long as we’re cognizant of the quality of our data, we should be safe in applying the insights.

Got a burning market research question?

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: Advanced Analytics, Big Data, Dear Dr. Jay, Data Integration, Passive Data

Dear Dr. Jay: Bayesian Networks

Posted by Dr. Jay Weiner

Thu, Jul 30, 2015

Hello Dr. Jay,

I enjoyed your recent post on predictive analytics that mentioned Bayesian Networks.

Could you explain Bayesian Networks in the context of survey research? I believe a Bayes Net says something about probability distribution for a given data set, but I am curious about how we can use Bayesian Networks to prioritize drivers, e.g. drivers of NPS or drivers of a customer satisfaction metric.

-Al

Dear Al,

Driver modeling is an interesting challenge. There are 2 possible reasons why folks do driver modeling. The first is to prioritize a set of attributes that a company might address to improve a key metric (like NPS). In this case, a simple importance ranking is all you need. The second reason is to determine the incremental change in your dependent variable (DV) as you improve any given independent variable by X. In this case, we’re looking for a set of coefficients that can be used to predict the dependent variable.

Why do I distinguish between these two things? Much of our customer experience and brand ratings work is confounded by multi-collinearity. What often happens in driver modeling is that 2 attributes that are highly correlated with each other might end up with 2 very different scores—one highly positive and the other 0, or worse yet, negative. In the case of getting a model to accurately predict the DV, I really don’t care about the magnitude of the coefficient or even the sign. I just need a robust equation to predict the value. In fact, this is seldom the case. Most clients would want these highly correlated attributes to yield the same importance score.

So, if we’re not interested in an equation to predict our DV, but do want importances, Bayes Nets can be a useful tool. There are a variety of useful outputs that come from Bayes Nets. Mutual information and Node Force are two such items. Mutual information is essentially the reduction in uncertainty about one variable given what we know about the value of another. We can think of Node Force as a correlation between any 2 items in the network. The more certain the relationship (higher correlation), the greater the Node Force.

The one thing that is relatively unique to Bayes Nets is the ability to see if the attributes are directly connected to your key measure or if they are moderated through another attribute. This information is often useful in understanding possible changes to other measures in the network. So, if the main goal is to help your client understand the structure in your data and what items are most important, Bayes Nets is quite useful.

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Topics: Advanced Analytics, NPS, Dear Dr. Jay

Voices: the CMB Blog

WELCOME TO OUR BLOG!

Subscribe to Email Updates

BROWSE BY TAG

The CMB Blog 2015: 6 of Our Favorites

Dear Dr. Jay: Can One Metric Rule Them All?

You Cheated—Can Love Restore Trust?

Dear Dr. Jay: Data Integration

Dear Dr. Jay: Bayesian Networks