WELCOME TO OUR BLOG!

The posts here represent the opinions of CMB employees and guests—not necessarily the company as a whole. 

Subscribe to Email Updates

Dear Dr. Jay: How To Predict Customer Turnover When Transactions are Anonymous

Posted by Dr. Jay Weiner

Wed, Apr 26, 2017

Dear Dr. Jay:

What's the best way to estimate customer turnover for a service business whose customer transactions are usually anonymous?

-Ian S.


Dear Ian,

You have posed an interesting question.  My first response was, “you can’t”. But as I think about it some more, you might already have some data in-house that could be helpful in addressing the issue.DRJAY-9-2 (1).png

It appears you are in the mass transit industry. Most transit companies offer single ride fares and monthly passes while companies in college towns often offer semester-long passes. Since oftentimes the passes (monthly, semester, etc.) are sold at a discounted rate, we might conclude that all the single fare revenues are turnover transactions.

This assumption is a small leap of faith as I’m sure some folks just pay the single fare price and ride regularly. Let’s consider my boss. He travels a fair amount and even with the discounted monthly pass, it’s often cheaper for him to pay the single ride fare. Me, I like the convenience of not having to make sure I have the correct fare in my pocket so I just pay the monthly rate, even if I don’t use it every day. We both might be candidates for weekly pass sales if we planned for those weeks when we know we’d be commuting every day versus working from home or traveling. I suspect the only way to get at that dimension would be to conduct some primary research to determine the frequency of ridership and how folks pay.

For your student passes, you probably have enough historic data in-house to compare your average semester pass sales to the population of students using them and can figure out if you see turnover in those sales. That leaves you needing to estimate the turnover on your monthly pass sales.

You also may have corporate sales that you could look at. For example, here at CMB, employees can purchase their monthly transit passes through our human resources department. Each month our cards are automatically updated so that we don’t have to worry about renewing it every few weeks.  I suspect if we analyzed the monthly sales from our transit system (MTBA) to CMB, we could determine the turnover rate.

As you can see, you could already have valuable data in-house that can help shed light on customer turnover. I’m happy to look at any information you have and let you know what options you might have in trying to answer your question.

Dr. Jay is CMB’s Chief Methodologist and VP of Advanced Analytics and holds a Zone 3 monthly pass to the MTBA.  If it wasn’t for the engineer, he wouldn’t make it to South Station every morning.

Keep those questions coming! Ask Dr. Jay directly at DearDrJay@cmbinfo.com or submit your question anonymously by clicking below:

Ask Dr. Jay!

Topics: advanced analytics, data collection, Dear Dr. Jay

Dear Dr. Jay: HOW can we trust predictive models after the 2016 election?

Posted by Dr. Jay Weiner

Thu, Jan 12, 2017

Dear Dr. Jay,

After the 2016 election, how will I ever be able to trust predictive models again?

Alyssa


Dear Alyssa,

Data Happens!

Whether we’re talking about political polling or market research, to build good models, we need good inputs. Or as the old saying goes: “garbage in, garbage out”.  Let’s look at all the sources of error in the data itself:DRJAY-9-2.png

  • First, we make it too easy for respondents to say “yes” and “no” and they try to help us by guessing what answer we want to hear. For example, we ask for purchase intent to a new product idea. The respondent often overstates the true likelihood of buying the product.
  • Second, we give respondents perfect information. We create 100% awareness when we show the respondent a new product concept.  In reality, we know we will never achieve 100% awareness in the market.  There are some folks who live under a rock and of course, the client will never really spend enough money on advertising to even get close.
  • Third, the sample frame may not be truly representative of the population we hope to project to. This is one of the key issues in political polling because the population is comprised of those who actually voted (not registered voters).  For models to be correct, we need to predict which voters will actually show up to the polls and how they voted.  The good news in market research is that the population is usually not a moving target.

Now, let’s consider the sources of error in building predictive models.  The first step in building a predictive model is to specify the model.  If you’re a purist, you begin with a hypotheses, collect the data, test the hypotheses and draw conclusions.  If we fail to reject the null hypotheses, we should formulate a new hypotheses and collect new data.  What do we actually do?  We mine the data until we get significant results.  Why?  Because data collection is expensive.  One possible outcome from continuing to mine the data looking for a better model is a model that is only good at predicting the data you have and not too accurate in predicting the results using new inputs. 

It is up to the analyst to decide what is statistically meaningful versus what is managerially meaningful.  There are a number of websites where you can find “interesting” relationships in data.  Some examples of spurious correlations include:

  • Divorce rate in Maine and the per capita consumption of margarine
  • Number of people who die by becoming entangled in their bedsheets and the total revenue of US ski resorts
  • Per capita consumption of mozzarella cheese (US) and the number of civil engineering doctorates awarded (US)

In short, you can build a model that’s accurate but still wouldn’t be of any use (or make any sense) to your client. And the fact is, there’s always a certain amount of error in any model we build—we could be wrong, just by chance.  Ultimately, it’s up to the analyst to understand not only the tools and inputs they’re using but the business (or political) context.

Dr. Jay loves designing really big, complex choice models.  With over 20 years of DCM experience, he’s never met a design challenge he couldn’t solve. 

PS – Have you registered for our webinar yet!? Join Dr. Erica Carranza as she explains why to change what consumers think of your brand, you must change their image of the people who use it.

What: The Key to Consumer-Centricity: Your Brand User Image

When: February 1, 2017 @ 1PM EST

Register Now!

 

 

Topics: methodology, data collection, Dear Dr. Jay, predictive analytics

Dear Dr. Jay: Weighting Data?

Posted by Dr. Jay Weiner

Wed, Nov 16, 2016

Dear Dr. Jay:

How do I know if my weighting matrix is good? 

Dan


Dear Dan,DRJAY-9.png

I’m excited you asked me this because it’s one of my favorite questions of all time.

First we need to talk about why we weight data in the first place.  We weight data because our ending sample is not truly representative of the general population.  This misrepresentation can occur because of non-response bias, poor sample source and even bad sample design.  In my opinion, if you go into a research study knowing that you’ll end up weighting the data, there may be a better way to plan your sample frame. 

Case in point, many researchers intentionally over-quota certain segments and plan to weight these groups down in the final sample.  We do this because the incidence of some of these groups in the general population is small enough that if we rely on natural fallout we would not get a readable base without a very large sample.  Why wouldn’t you just pull a rep sample and then augment these subgroups?  The weight needed to add these augments into the rep sample is 0. 

Arguments for including these augments with a very small weight include the treatment of outliers.  For example, if we were conducting a study of investors and we wanted to include folks with more than $1,000,000 in assets, we might want to obtain insights from at least 100 of these folks.  In a rep sample of 500, we might only have 25 of them.  This means I need to augment this group by 75 respondents.  If somehow I manage to get Warren Buffet in my rep sample of 25, he might skew the results of the sample.  Weighting the full sample of 100 wealthier investors down to 25 will reduce the impact of any outlier.

A recent post by Nate Cohn in the New York Times suggested that weighting was significantly impacting analysts’ ability to predict the outcome of the 2016 presidential election.  In the article, Mr. Cohn points out, “there is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.”  This man carried a sample weight of 30.  In a sample of 3000 respondents, he now accounts for 1% of the popular vote.  In a close race, that might just be enough to tip the scale one way or the other.  Clearly, he showed up on November 8th and cast the deciding ballot.

This real life example suggests that we might want to consider “capping” extreme weights so that we mitigate the potential for very small groups to influence overall results. But bear in mind that when we do this, our final sample profiles won’t be nationally representative because capping the weight understates the size of the segment being capped.  It’s a trade-off between a truly balanced sample and making sure that the survey results aren’t biased. [Tweet this!]

Dr. Jay loves designing really big, complex choice models.  With over 20 years of DCM experience, he’s never met a design challenge he couldn’t solve. 

Keep the market research questions comin'! Ask Dr. Jay directly at DearDrJay@cmbinfo.com or submit yours anonymously by clicking below:

 Ask Dr. Jay!

Topics: methodology, Dear Dr. Jay

Dear Dr. Jay: When To Stat Test?

Posted by Dr. Jay Weiner

Wed, Oct 26, 2016

Dear Dr. Jay,

The debate over how and when to test for statistical significance comes up nearly every engagement. Why wouldn’t we just test everything?

-M.O. in Chicago


 DRJAY.pngHi M.O.-

You’re not alone. Many clients want all sorts of things stat tested. Some things can be tested while others can’t. But for what can be tested, as market researchers we need to be mindful of two potential errors in hypothesis testing. Type I errors are when we reject a true null hypothesis. For example, if we accept the claim that Coke tastes better than Pepsi, it’s erroneous because in fact, it’s not true.

A type II error occurs when we accept the null hypothesis when in fact it is false. This part is safe to install and then the plane crashes. We choose the probability of committing a type I error when we choose alpha (say .05). The probability of a type II error is a function of power. We seldom take this side of the equation into account for good reason. Most decisions we make in market research don’t come with a huge price tag if we’re wrong. Hardly anyone ever dies if the results of the study are wrong. The goal in any research is to minimize both types of errors. The best way to do that is to use a larger sample.

This conundrum perfectly illustrates my “Life is a conjoint” mantra. While testing we’re always trading off between the accuracy of the results with the cost of executing a study with a larger sample. Further, we also tend to violate the true nature of hypothesis testing. More often than not, we don’t formally state a hypothesis. Rather, we statistically test everything and then report the statistical differences.

Consider this: when we compare two scores, we accept that we might get a statistical difference of 5% of the time simply by chance (a=.05). This could be the difference in concept acceptance between males and females.

In fact, that’s not really what we do, we perform hundreds of tests in most every study. Let’s say we have five segments and we want to test them for differences in concept acceptance. That’s 10 t-tests. Now we have a 29% chance of flagging a difference simply due to chance. That’s in every row of our tables. The better test would be to run an analysis of variance on the table to determine if any cell might be different. Then build a hypothesis and test them one at a time. But we don’t do this because it takes too much time. I realize I’m not going to change the way our industry does things (I’ve been trying for years), but maybe, just maybe you’ll pause for a moment when looking at your tables to decide if this “statistical” significance is really worth reporting—are the results valid and are they useful?.

Dr. Jay loves designing really big, complex choice models.  With over 20 years of DCM experience, he’s never met a design challenge he couldn’t solve. 

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here:

Ask Dr. Jay!

 

 

Topics: advanced analytics, Dear Dr. Jay

Dear Dr. Jay: Driver Modeling

Posted by Dr. Jay Weiner

Thu, Jun 23, 2016

Dear Dr. Jay,

We want to assess the importance of fixing some of our customer touchpoints, what would you recommend as a modeling tool?

 -Alicia


Hi Alicia,

DRJAY.pngThere are a variety of tools we use to determine the relative importance of key variables on an outcome (dependent variable). Here’s the first question we need to address: are we trying to predict the actual value of the dependent variable or just assess the importance of any given independent variable in the equation? Most of the time, the goal is the latter.

Once we know the primary objective, there are three key criteria we need to address. The first is the amount of multicollinearity in our data. The more independent variables we have, the bigger problem this presents. The second is the stability in the model over time. In tracking studies, we want to believe that the differences between waves are due to actual differences in the market and not artifacts of the algorithm used to compute the importance scores. Finally, we need to understand the impact of sample size on the models.

How big a sample do you need? Typically, in consumer research, we see results stabilize with n=200. Some tools will do a better job with smaller samples than others. You should also consider the number of parameters you are trying to model. A grad school rule of thumb is that you need 4 observations for each parameter in the model, so if you have 25 independent variables, you’d need at least 100 respondents in your sample.

There are several tools to consider using to estimate relative importance: Bivariate Correlations, OLS, Shapley Value Regression (or Kruskal’s Relative Importance), TreeNet, and Bayesian Networks are all options. All of these tools will let you understand the relative importance of the independent variables in predicting your key measure. One think to note is that none of the tools specifically model causation. You would need some sort of experimental design to address that issue. Let’s break down the advantages and disadvantages of each. 

Bivariate Correlations (measures the strength of the relationship between two variables)

  • Advantages: Works with small samples. Relatively stable wave to wave. Easy to execute. Ignores multicollinearity.
  • Disadvantages: Only estimates the impact of one attribute at a time. Ignores any possible interactions. Doesn’t provide an “importance” score, but a “strength of relationship” value.  Assumes a linear relationship among the attributes. 

Ordinary Least Squares regression (OLS) (method for estimating the unknown parameters in a linear regression model)

  • Advantages: Easy to execute. Provides an equation to predict the change in the dependent variable based on changes in the independent variable (predictive analytics).
  • Disadvantages: Highly susceptible to multicollinearity, causing changes in key drivers in tracking studies. If the goal is a predictive model, this isn’t a serious problem. If your goal is to prioritize areas of improvement, this is a challenge. Assumes a linear relationship among the attributes. 

Shapley Value Regression or Kruskal’s Relative Importance

These are a couple of approaches that consider all possible combinations of explanatory variables. Unlike traditional regression tools, these techniques are not used for forecasting. In OLS, we predict the change in overall satisfaction for any given change in the independent variables. These tools are used to determine how much better the model is if we include any specific independent variable versus models that do not include that measure. The conclusions we draw from these models refer to the usefulness of including any measure in the model and not its specific impact on improving measures like overall satisfaction. 

  • Advantages: Works with smaller samples. Does a better job of dealing with multicollinearity. Very stable in predicting the impact of attributes between waves.
  • Disadvantages: Ignores interactions. Assumes a linear relationship among the attributes.

TreeNet (a tree-based data mining tool)

  • Advantages: Does a better job of dealing with multicollinearity than most linear models. Very stable in predicting the impact of attributes between waves. Can identify interactions. Does not assume a linear relationship among the attributes.
  • Disadvantages: Requires a larger sample size—usually n=200 or more. 

Bayesian Networks (a graphical representation of the joint probabilities among key measures)

  • Advantages: Does a better job of dealing with multicollinearity than most linear models. Very stable in predicting the impact of attributes between waves. Can identify interactions. Does not assume a linear relationship among the attributes. Works with smaller samples. While a typical Bayes Net does not provide a system of equations, it is possible to simulate changes in the dependent variable based on changes to the independent variables.
  • Disadvantages: Can be more time-consuming and difficult to execute than the others listed here.

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: advanced analytics, Dear Dr. Jay

Dear Dr. Jay: Discrete Choice—How Many Is Too Many Features?

Posted by Dr. Jay Weiner

Wed, Mar 23, 2016

Dear Dr. Jay,

I’m interested in testing a large number of features for inclusion in the next version of my product. My team is suggesting that we need to cull the list down to a smaller set of items to run a choice model. Are there ways to test a large set of attributes in a choice model?

-Nick


 DRJAY.pngHi Nick –

There are a number of ways to test a large set of attributes in choice modeling. Most of the time, when we test a large number of features, many are simply binary attributes (included/not included). While this makes the experimental design larger, it’s not quite as bad as having ten six-level attributes. If the description is short enough, you might go ahead and just include all of them. If you’re concerned about how much reading a respondent will need to do—or you really wouldn’t offer a respondent 12 additional perks for choosing your credit card—you could put a cap on the number of additional features any specific offer includes. For example, you could test 15 new features in a single model, but respondents would only get up to 5 at any single time. This is actually better than using a partial profile design as all respondents would see all offers. 

Another option is to do some sort of bridging study where you test all of the features using a max diff task. You can include a subset of the factors in a DCM and then use the max diff utilities to compute the utility for the full list of features in the DCM. This allows you to include the full set of features in your simulation tool.

Dr. Jay loves designing really big, complex choice models.  With over 20 years of DCM experience, he’s never met a design challenge he couldn’t solve. 

Topics: advanced analytics, product development, Dear Dr. Jay

Dear Dr. Jay—Brands Ask: Let's Stay Together?

Posted by Dr. Jay Weiner

Thu, Feb 11, 2016

 Dear Dr. Jay,

 What’s love got to do with it?

 -Tina T. 


DrJay_Thinking_about_love.pngHi Tina,

How timely.

The path to brand loyalty is often like the path to wedded bliss. You begin by evaluating tangible attributes to determine if the brand is the best fit for you. After repeated purchase occasions, you form an emotional bond to the brand that goes beyond those tangible attributes. As researchers, when we ask folks why they purchase a brand, they often reflect on performance attributes and mention those as drivers of purchase. But, to really understand the emotional bond, we need to ask how you feel when you interact with the brand.

We recently developed a way to measure this emotional bond (Net Positive Emotion Score - NPES). By asking folks how they felt on their most recent interaction, we’re able to determine respondents’ emotional bond with products. Typical regression tools indicate that the emotional attributes are about as predictive of future behavior as the functional benefits of the product. This leads us to believe that at some point in your pattern of consumption, you become bonded to the product and begin to act on emotion—rather than rational thoughts. Of course, that doesn’t mean you can’t rate the performance dimensions of the products you buy.

Loyalty is a behavior, and behaviors are often driven by underlying attitudinal measures. You might continue to purchase the same product over and over for a variety of reasons. In a perfect world, you not only create a behavioral commitment, but also an emotional bond with the brand and, ultimately, the company. Typically, we measure this path by looking at the various stages you go through when purchasing products. This path begins with awareness, evolves through familiarity and consideration, and ultimately ends with purchase. Once you’ve purchased a product, you begin to evaluate how well it delivers on the brand promise. At some point, the hope is that you become an advocate for the brand since advocacy is the pinnacle of the brand purchase hierarchy. 

As part of our Consumer Pulse program, we used our EMPACT℠: Emotional Impact Analysis tool to measure consumers’ emotional bond (NPES) with 30 brands across 6 categories. How well does this measure impact other key metrics? On average, Net Promoters score almost 70 points higher on the NPES scale versus Net Detractors. We see similar increases in likelihood to continue (or try), proud to use, willingness to pay more, and “I love this brand.”

NPES.jpg

What does this mean? It means that measuring the emotional bond your customers have with your brand can provide key insights into the strength of that brand. Not only do you need to win on the performance attributes, but you also need to forge a deep bond with your buyers. That is a better way to brand loyalty, and it should positively influence your bottom line. You have to win their hearts—not just their minds.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. He has a strong emotional bond with his wife of 25 years and several furry critters who let him sleep in their bed.

Learn More About EMPACT℠

Topics: NPS, path to purchase, Dear Dr. Jay, EMPACT, emotional measurement, brand health and positioning

Dear Dr. Jay: Can One Metric Rule Them All?

Posted by Dr. Jay Weiner

Wed, Dec 16, 2015

Hi Dr. Jay –

The city of Boston is trying develop one key measure to help officials track and report how well the city is doing. We’d like to do that in house. How would we go about it?

-Olivia


DrJay_desk-withGoatee.pngHi Olivia,

This is the perfect tie in for big data and the key performance index (KPI). Senior management doesn’t really have time to pour through tables of numbers to see how things are going. What they want is a nice barometer that can be used to summarize overall performance. So, how might one take data from each business unit and aggregate them into a composite score?

We begin the process by understanding all the measures we have. Once we have assembled all of the potential inputs to our key measure, we need to develop a weighting system to aggregate them into one measure. This is often the challenge when working with internal data. We need some key business metric to use as the dependent variable, and these data are often missing in the database.

For example, I might have sales by product by customer and maybe even total revenue. Companies often assume that the top revenue clients are the bread and butter for the company. But what if your number one account uses way more corporate resources than any other account? If you’re one of the lucky service companies, you probably charge hours to specific accounts and can easily determine the total cost of servicing each client. If you sell a tangible product, that may be more challenging. Instead of sales by product or total revenue, your business decision metric should be the total cost of doing business with the client or the net profit for each client. It’s unlikely that you capture this data, so let’s figure out how to compute it. Gross profit is easy (net sales – cost of goods sold), but what about other costs like sales calls, customer service calls, and product returns? Look at other internal databases and pull information on how many times your sales reps visited in person or called over the phone, and get an average cost for each of these activities. Then, you can subtract those costs from the gross profit number. Okay, that was an easy one.

Let’s look at the city of Boston case for a little more challenging exercise. What types of information is the city using? According to the article you referenced, the city hopes to “corral their data on issues like crime, housing for veterans and Wi-Fi availability and turn them into a single numerical score intended to reflect the city’s overall performance.” So, how do you do that? Let’s consider that some of these things have both income and expense implications. For example, as crime rates go up, the attractiveness of the city drops and it loses residents (income and property tax revenues drop). Adding to the lost revenue, the city has the added cost of providing public safety services. If you add up the net gains/losses from each measure, you would have a possible weighting matrix to aggregate all of the measures into a single score. This allows the mayor to quickly assess changes in how well the city is doing on an ongoing basis. The weights can be used by the resource planners to assess where future investments will offer the greatest pay back.

 Dr. Jay is fascinated by all things data. Your data, our data, he doesn’t care what the source. The more data, the happier he is.

Topics: advanced analytics, Boston, big data, Dear Dr. Jay

Dear Dr. Jay: The Internet of Things and The Connected Cow

Posted by Dr. Jay Weiner

Thu, Nov 19, 2015

Hello Dr. Jay, 

What is the internet of things, and how will it change market research?

-Hugo 


DrJay_Thinking-withGoatee_cow.png

Hi Hugo,

The internet of things is all of the connected devices that exist. Traditionally, it was limited to PCs, tablets, and smartphones. Now, we’re seeing wearables, connected buildings and homes. . .and even connected cows. (Just when I thought I’d seen it all.) Connected cows, surfing the internet looking for the next greenest pasture. Actually, a number of companies offer connected cow solutions for farmers. Some are geared toward beef cattle, others toward dairy cows. Some devices are worn on the leg or around the neck, others are swallowed (I don’t want to know how you change the battery). You can track the location of the herd, monitor milk production, and model the best field for grass to increase milk output. The solutions offer alerts to the farmer when the cow is sick or in heat, which means that the farmer can get by with fewer hands and doesn’t need to be with each cow 24/7. Not only can the device predict when a cow is in heat, it can also bias the gender of the calf based on the window of opportunity. Early artificial insemination increases the probability of getting a female calf. So, not only can the farmer increase his number of successful inseminations, he/she can also decide if more bulls or milk cows are needed in the herd. 

How did this happen? A bunch of farmers put the devices on the herd and began collecting data. Then, the additional data is appended to the data set (e.g., the time the cow was inseminated, whether it resulted in pregnancy, and the gender of the calf). If enough farmers do this, we can begin to build a robust data set for analysis.

So, what does this mean for humans? Well, many of you already own some sort of fitness band or watch, right? What if a company began to collect all of the data generated by these devices? Think of all the things the company could do with those data! It could predict the locations of more active people. If it appended some key health measures (BMI, diabetes, stroke, death, etc.) to the dataset, the company could try to build a model that predicts a person’s probability of getting diabetes, having a stroke, or even dying. Granted, that’s probably not a message you want from your smart watch: “Good afternoon, Jay. You will be dead in 3 hours 27 minutes and 41 seconds.” Here’s another possible (and less grim) message: “Good afternoon, Jay. You can increase your time on this planet if you walk just another 1,500 steps per day.” Healthcare providers would also be interested in this information. If healthcare providers had enough fitness tracking data, they might be able to compute new lifetime age expectations and offer discounts to customers who maintain a healthy lifestyle (which is tracked on the fitness band/watch).  

Based on connected cows, the possibility of this seems all too real. The question is: will we be willing to share the personal information needed to make this happen? Remember: nobody asked the cow if it wanted to share its rumination information with the boss.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. He is completely fascinated and paranoid about the internet of things. Big brother may be watching, and that may not be a good thing.

Topics: technology research, healthcare research, data collection, Dear Dr. Jay, internet of things, data integration

You Cheated—Can Love Restore Trust?

Posted by James Kelley

Mon, Nov 02, 2015

This year has been rife with corporate scandals. For example, FIFA’s corruption case and Volkswagen’s emissions cheating admission may have irreparably damaged public trust for these organizations. These are just two of the major corporations caught this year, and if history tells us anything, we’re likely to see at least another giant fall in 2015. 

What can managers learn about their brands from watching the aftermath of corporate scandal? Let’s start with the importance of trust—something we can all revisit. We take it for granted when our companies or brands are in good standing, but when trust falters, it recovers slowly and impacts all parts of the organization. To prove the latter point, we used data from our recent self-funded Consumer Pulse research to understand the relationship between Likelihood to Recommend (LTR), a Key Performance Indicator, and Trustworthiness amongst a host of other brand attributes. 

Before we dive into the models, let’s talk a little bit about the data. We leveraged data we collected some months ago—not at the height of any corporate scandal. In a perfect world, we would have pre-scandal and post-scandal observations of trust to understand any erosion due to awareness of the deception. This data also doesn’t measure the auto industry or professional sports. It focuses on brands in the hotel, e-commerce, wireless, airline, and credit card industries. Given the breadth of the industries, the data should provide a good look at how trust impacts LTR across different types of organizations. Finally, we used Bayes Net (which we’ve blogged about quite a bit recently) to factor and map the relationships between LTR and brand attributes. After factoring, we used TreeNet to get a more direct measure of explanatory power for each of the factors.

First, let’s take a look at the TreeNet results. Overall, our 31 brand attributes explain about 71% of the variance in LTR—not too shabby. Below are each factors’ individual contribution to the model (summing to 71%). Each factor is labeled by the top loading attribute, although they are each comprised of 3-5 such variables. For a complete list of which attributes goes with which factor, see the Bayes Net map below. That said, this list (labeled by the top attributes) should give you an idea of what’s directly driving LTR:

tree net, cmb, advanced analytics

Looking at these factor scores in isolation, they make inherent sense—love for a brand (which factors with “I am proud to use” and “I recommend, like, or share with friends”) is the top driver of LTR. In fact, this factor is responsible for a third of the variance we can explain. Other factors, including those with trust and “I am proud to wear/display the logo of Brand X” have more modest (and not all that dissimilar) explanatory power. 

You might be wondering: if Trustworthiness doesn’t register at the top of the list for TreeNet, then why is it so important? This is where Bayes Nets come in to play. TreeNet, like regression, looks to measure the direct relationships between independent and dependent variables, holding everything else constant. Bayes Nets, in contrast, looks for the relationships between all the attributes and helps map direct as well as indirect relationships.

Below is the Bayes Net map for this same data (and you can click on the map to see a larger image). You need three important pieces of information to interpret this data:

  1. The size of the nodes (circles/orbs) represents how important a factor is to the model. The bigger the circle, the more important the factor.
  2. Similarly, the thicker the lines, the stronger a relationship is between two factors/variables. The boldest lines have the strongest relationships.
  3. Finally, we can’t talk about causality, but rather correlations. This means we can’t say Trustworthiness causes LTR to move in a certain direction, but rather that they’re related. And, as anyone who has sat through an introduction to statistics course knows, correlation does not equal causation.

bayes net, cmb, advanced analytics

Here, Factor 7 (“I love Brand X”) is no longer a hands-down winner in terms of explanatory power. Instead, you’ll see that Factors 3, 5, 7 and 9 each wield a great deal of influence in this map in pretty similar quantities. Factor 7, which was responsible for over a third of the explanatory power before, is well-connected in this map. Not surprising—you don’t just love a brand out of nowhere. You love a brand because they value you (Factor 5), they’re innovative (Factor 9), they’re trustworthy (Factor 3), etc. Factor 7’s explanatory power in the TreeNet model was inflated because many attributes interact to produce the feeling of love or pride around a brand.

Similarly, Factor 3 (Trustworthiness) was deflated. The TreeNet model picked up the direct relationship between Trustworthiness and LTR, but it didn’t measure its cumulative impact (a combination of direct and indirect relationships). Note how well-connected Factor 3 is. It’s strongly related (one of the strongest relationships in the map) to Factor 5, which includes “Brand X makes me feel valued,” “Brand X appreciates my business,” and “Brand X provides excellent customer service.” This means these two variables are fairly inseparable. You can’t be trustworthy/have a good reputation without the essentials like excellent customer service and making customers feel valued. Although to a lesser degree, Trustworthiness is also related to love. Business is like dating—you can’t love someone if you don’t trust them first.

The data shows that sometimes relationships aren’t as cut and dry as they appear in classic multivariate techniques. Some things that look important are inflated, while other relationships are masked by indirect pathways. The data also shows that trust can influence a host of other brand attributes and may even be a prerequisite for some. 

So what does this mean for Volkswagen? Clearly, trust is damaged and will need to be repaired.  True to crisis management 101, VW has jettisoned a CEO and will likely make amends to those owners who have been hurt by their indiscretions. But how long will VW feel the damage done by this scandal? For existing customers, the road might be easier. One of us, James, is a current VW owner, and he is smitten with the brand. His particular model (GTI) wasn’t impacted, and while the cheating may damage the value of his car, he’s not selling it anytime soon. For prospects, love has yet to develop and a lack of trust may eliminate the brand from their consideration set.

The takeaway for brands? Don’t take trust for granted. It’s great while you’re in good favor, but trust’s reach is long, varied, and has the potential to impact all of your KPIs. Take a look at your company through the lens of trust. How can you improve? Take steps to better your customer service and to make customers feel valued. It may pay dividends in improving trust, other KPIs, and, ultimately, love.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. He keeps buying new cars to try to make the noise on the right side go away.

James Kelley splits his time at CMB as a Project Manager for the Technology/eCommerce team and as a member of the analytics team. He is a self-described data nerd, political junkie, and board game geek. Outside of work, James works on his dissertation in political science which he hopes to complete in 2016.

Topics: advanced analytics, data collection, Dear Dr. Jay, data integration, customer experience and loyalty