WELCOME TO OUR BLOG!

The posts here represent the opinions of CMB employees and guests—not necessarily the company as a whole. 

Subscribe to Email Updates

Dear Dr. Jay: Data Integration

Posted by Jay Weiner, PhD

Wed, Aug 26, 2015

Dear Dr. Jay,

How can I explain the value of data integration to my CMO and other non-research folks?

- Jeff B. 


 

DRJAY-3

Hi Jeff,

Years ago, at a former employer that will remain unnamed, we used to entertain ourselves by playing Buzzword Bingo in meetings. We’d create Bingo cards with 30 or so words that management like to use (“actionable,” for instance). You’d be surprised how fast you could fill a card. If you have attended a conference in the past few years, you know we as market researchers have plenty of new words to play with. Think: big data, integrated data, passive data collection, etc. What do all these new buzzwords really mean to the research community? It boils down to this: we potentially have more data to analyze, and the data might come from multiple sources.

If you only collect primary survey data, then you typically only worry about sample reliability, measurement error, construct validity, and non-response bias. However, with multiple sources of data, we need to worry about all of that plus level of aggregation, impact of missing data, and the accuracy of the data. When we typically get a database of information to append to survey data, we often don’t question the contents of that file. . . but maybe we should.

A client recently sent me a file with more than 100,000 records (ding ding, “big data”). Included in the file were survey data from a number of ad hoc studies conducted over the past two years as well as customer behavioral data (ding ding, “passive data”). And, it was all in one file (ding ding, “integrated data”). BINGO!

I was excited to get this file for a couple of reasons. One, I love to play with really big data sets, and two, I was able to start playing right away. Most of the time, clients send me a bunch of files, and I have to do the integration/merging myself. Because this file was already integrated, I didn’t need to worry about having unique and matching record identifiers in each file.

Why would a client have already integrated these data? Well, if you can add variables to your database and append attitudinal measures, you can improve the value of the modeling you can do. For example, let’s say that I have a Dunkin’ Donuts (DD) rewards card, and every weekday, I stop by a DD close to my office and pick up a large coffee and an apple fritter. I’ve been doing this for quite some time, so the database modelers feel fairly confident that they can compute my lifetime value from this pattern of transactions. However, if the coffee was cold, the fritter was stale, and the server was rude during my most recent transaction, I might decide that McDonald’s coffee is a suitable substitute and stop visiting my local DD store in favor of McDonald’s. How many days without a transaction will it take the DD algorithm to decide that my lifetime value is now $0.00? If we had the ability to append customer experience survey data to the transaction database, maybe the model could be improved to more quickly adapt. Maybe even after 5 days without a purchase, it might send a coupon in an attempt to lure me back, but I digress.

Earlier, I suggested that maybe we should question the contents of the database. When the client sent me the file of 100,000 records, I’m pretty sure that was most (if not all) of the records that had both survey and behavioral measures. Considering the client has millions of account holders, that’s actually a sparse amount of data. Here’s another thing to consider: how well do the two data sources line up in time? Even if 100% of my customer records included overall satisfaction with my company, these data may not be as useful as you might think. For example, overall satisfaction in 2010 and behavior in 2015 may not produce a good model. What if some of the behavioral measures were missing values? If a customer recently signed up for an account, then his/her 90-day behavioral data elements won’t get populated for some time. This means that I would need to either remove these respondents from my file or build unique models for new customers.

The good news is that there is almost always some value to be gained in doing these sorts of analysis. As long as we’re cognizant of the quality of our data, we should be safe in applying the insights.

Got a burning market research question?

Email us! OR  Submit anonymously!

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: advanced analytics, big data, Dear Dr. Jay, data integration, passive data

Dear Dr. Jay: Bayesian Networks

Posted by Dr. Jay Weiner

Thu, Jul 30, 2015

Hello Dr. Jay,

I enjoyed your recent post on predictive analytics that mentioned Bayesian Networks.

Could you explain Bayesian Networks in the context of survey research? I believe a Bayes Net says something about probability distribution for a given data set, but I am curious about how we can use Bayesian Networks to prioritize drivers, e.g. drivers of NPS or drivers of a customer satisfaction metric.

-Al

Dear Dr. Jay, Chadwick Martin BaileyDear Al,

Driver modeling is an interesting challenge. There are 2 possible reasons why folks do driver modeling. The first is to prioritize a set of attributes that a company might address to improve a key metric (like NPS). In this case, a simple importance ranking is all you need. The second reason is to determine the incremental change in your dependent variable (DV) as you improve any given independent variable by X. In this case, we’re looking for a set of coefficients that can be used to predict the dependent variable.

Why do I distinguish between these two things? Much of our customer experience and brand ratings work is confounded by multi-collinearity. What often happens in driver modeling is that 2 attributes that are highly correlated with each other might end up with 2 very different scores—one highly positive and the other 0, or worse yet, negative. In the case of getting a model to accurately predict the DV, I really don’t care about the magnitude of the coefficient or even the sign. I just need a robust equation to predict the value. In fact, this is seldom the case. Most clients would want these highly correlated attributes to yield the same importance score.

So, if we’re not interested in an equation to predict our DV, but do want importances, Bayes Nets can be a useful tool. There are a variety of useful outputs that come from Bayes Nets. Mutual information and Node Force are two such items. Mutual information is essentially the reduction in uncertainty about one variable given what we know about the value of another. We can think of Node Force as a correlation between any 2 items in the network. The more certain the relationship (higher correlation), the greater the Node Force.

The one thing that is relatively unique to Bayes Nets is the ability to see if the attributes are directly connected to your key measure or if they are moderated through another attribute. This information is often useful in understanding possible changes to other measures in the network. So, if the main goal is to help your client understand the structure in your data and what items are most important, Bayes Nets is quite useful.

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: advanced analytics, NPS, Dear Dr. Jay

Mobile Passive Behavioral Data: Opportunities and Pitfalls

Posted by Chris Neal

Tue, Jul 21, 2015

By Chris Neal and Dr. Jay Weiner

Hands with phonesAs I wrote in last week’s post, we recently conducted an analysis of mobile wallet use in the U.S. To make it interesting, we used unlinked passive mobile behavioral data alongside survey-based data.In this post, I’ve teamed up with Jay Weiner—our VP of Analytics who helped me torture analyze the mobile passive behavioral data for this Mobile Wallet study—to share some of the typical challenges you may face when working with passive mobile behavioral data (or any type of passive behavioral data for that matter) along with some best practices for dealing with these challenges:

  1. Not being able to link mobile usage to individualsThere’s a lot of online passive data out there (mobile app usage ratings, web usage ratings by device type, social media monitoring, etc.) that is at the aggregate level and cannot be reliably attributed to individuals. These data have value, to be sure, but aggregate traffic data can sometimes be very misleading. This is why—for the Mobile Wallet project CMB did—we sourced mobile app and mobile web usage from the Research Now mobile panel where it is possible to attribute mobile usage data to individuals (and have additional profiling information on these individuals). 

    When you’re faced with aggregate level data that isn’t linked to individuals, we recommend either getting some sample from a mobile usage panel in order to understand and calibrate your results better and/or doing a parallel survey-sampling so you can make more informed assumptions (this holds true for aggregate search trend data, website clickstream data, and social media listening tools).
  1. Unstacking the passive mobile behavioral data. Mobile behavioral data that is linked to individuals typically comes in “stacked” form, i.e., every consumer tracked has many different records: one for each active mobile app or mobile website session. Analyzing this data in its raw form is very useful for understanding overall mobile usage trends. What these stacked behavioral data files do not tell you, however, is the reach or incidence (e.g., how many people or the percentage of an addressable market) of any given mobile app/website. It also doesn’t tell you the mobile session frequency or duration characteristics of different consumer types nor does it allow you to profile types of people with different mobile behaviors. 

    Unstacking a mobile behavioral data file can sometimes end up being a pretty big programming task, so we recommend deciding upfront exactly which apps/websites you want to “unstack.” A typical behavioral data file that tracks all smartphone usage during a given period of time can involve thousands of different apps and websites. . .and the resulting unstacked data file covering all of these could quickly become unwieldy.
  1. Beware the outlier! Unstacking a mobile behavioral data file will reveal some pretty extreme outliers. We all know about outliers, right? In survey research, we scrub (or impute) open-ended quant responses that are three standard deviations higher than the mean response, we take out some records altogether if they claim to be planning to spend $6 billion on their next smartphone purchase, and so on. But outliers in passive data can be quite extreme. In reviewing the passive data for this particular project, I couldn’t help but recall that delightful Adobe Marketing ad in which a baby playing with his parents’ tablet repeatedly clicks the “buy” button for an encyclopedia company’s e-commerce site, setting off a global stock bubble. 

    Here is a real-world example from our mobile wallet study that illustrates just how wide the range is of mobile behaviors across even a limited group of consumers: the overall “average” time spent using a mobile wallet app was 162 minutes, but the median time was only 23 minutes. A very small (<1% of total) portion of high-usage individuals created an average that grossly inflated the true usage snapshot of the majority of users. One individual spent over 3,000 minutes using a mobile wallet app.
  1. Understand what is (and what is not) captured by a tracking platform. Different tracking tools do different things and produce different data to analyze. In general, it’s very difficult to capture detailed on-device usage for iOS devices. . .most platforms set up a proxy that instead captures and categorizes the IP addresses that the device transmits data to/from. In our Mobile Wallet study, as one example, our mobile behavioral data did not pick up any Apple Pay usage because it leverages NFC to conduct the transaction between the smartphone and the NFC terminal at the cash register (without any signal ever being transmitted out to the mobile web or to any external mobile app, which is how the platform captured mobile usage).   There are a variety of tricks of the trade to account for these phenomenon and to adjust your analysis so you can get close to a real comparison, but you need to understand what things aren’t picked up by passive metering in order to apply them correctly.
  1. Categorize apps and websites. Needless to say, there are many different mobile apps and websites that people use, and many of these do a variety of different things and are used for a variety of different purposes. Additionally, the distribution of usage across many niche apps and websites is often not useful for any meaningful insights work unless these are bundled up into broader categories. 

    Some panel sources—including Research Now’s mobile panel—have existing mobile website and app categories, which are quite useful. For many custom projects, however, you’ll need to do the background research ahead of time in order to have meaningful categories to work with. Fishing expeditions are typically not a great analysis plan in any scenario, but they are out of the question if you’re going to dive into a big mobile usage data file.

    As you work to create meaningful categories for analysis, be open to adjusting and iterating. A certain group of specific apps might not yield the insight you were looking for. . .learn from the data you see during this process then try new groupings of apps and websites accordingly.
  1. Consider complementary survey sampling in parallel with behavioral analysis. During our iterative process of attempting to categorize mobile apps from reviewing passive mobile behavioral data, we were relieved to have a complementary survey sampling data set that helped us make some very educated guesses about how or why people were using different apps. For example, PayPal has a very successful mobile app that is widely used for a variety of reasons—peer-to-peer payments, ecommerce payments, and, increasingly, for “mobile wallet” payments at a physical point of sale. The passive behavioral data we had could not tell us what proportion of different users’ PayPal mobile app usage was for which purpose. That’s a problem because if we were relying on passive data alone to tell our clients what percent of smartphone users have used a mobile wallet to pay at a physical point of sale, we could come up with grossly inflated numbers. As an increasing number of mobile platforms add competing functionality (e.g., Facebook now has mobile payments functionality), this will remain a challenge.

    Passive tracking platforms will no doubt crack some of these challenges accurately, but some well-designed complementary survey sampling can go a long way towards helping you read the behavioral tea leaves with greater confidence. It can also reveal differences between actual vs. self-reported behavior that are valuable for businesses (e.g., a lot of people may say they really want a particular mobile functionality when asked directly, but if virtually no one is actually using existing apps that provide this functionality then perhaps your product roadmap can live without it for the next launch).

Want to learn more about the future of Mobile Wallet? Join us for a webinar on August 19, and we’ll share our insights with you!

Chris Neal leads CMB’s Tech Practice. He judges every survey he takes and every website he visits by how it looks on his 4” smartphone screen, and has sworn off buying a larger “phablet” screen size because it wouldn’t fit well in his Hipster-compliant skinny jeans.

Dr. Jay heads up the analytics group at CMB. He opted for the 6 inch “phablet” and baggy jeans.  He does look stupid talking to a brick. He’s busy trying to compute which event has the higher probability: his kids texting him back or his kids completing an online questionnaire. Every month, he answers your burning market research questions in his column: Dear Dr. Jay. Got a question? Ask it here!

Want to learn more about combining survey data with passive mobile behavioral data? Watch our recent webinar with Research Now that discusses these findings in depth.

Watch Now!

Topics: advanced analytics, methodology, data collection, mobile, Dear Dr. Jay, webinar, passive data

Dear Dr. Jay: The 3 Rules for Creating Truly Useful KPI

Posted by Dr. Jay Weiner

Thu, Jun 04, 2015

Dear Dr. Jay,

How can my organization create a Key Performance Indicator (KPI) that’s really useful?

-Meeta R., Seattle

Dear Meeta,

CMB, NPS, KPI, Dear Dr. Jay, Jay WeinerA key performance indicator (KPI) is often used to communicate to senior management how well the company is doing, with a single metric. It could be based on a single attribute in the questionnaire, e.g., the top two boxes of intent to continue using the brand. Another popular KPI is the Net Promoter Score (NPS), based on likelihood to recommend, where we take the percentage of customers who are promoters and subtract the percentage who are detractors.

Over the years, likelihood to continue, overall satisfaction, and likelihood to recommend have all been candidates for inclusion in creating a KPI. We find these measures are often highly correlated with each other.  This suggests that while any one measure might be a decent KPI, there is a unique piece of each that is not captured by the others. Likelihood to continue and likelihood to recommend both have a behavioral dimension to them, while overall satisfaction is most likely purely attitudinal. 

There are a few key things to consider in selecting (or creating) a KPI: 

  1. The number should be easy to explain and compute. 

  2. It must be tied to some key business outcome, such as increased revenue.

  3. Finally, it should be fairly responsive to future changes.

In the third consideration, a balance of behavioral and attitudinal measures comes into play. If you’re trying to predict future purchases, past purchases are a good measure to use. For example, if my past 10 credit card transactions were with my Visa card, there’s a very good probability that my next transaction will be made with that same card. Even if I have a bad experience on the 11th purchase with my Visa card, the prediction for the 12th purchase would still be Visa. However, if I include some attitudinal component in my KPI, I can change the prediction of the model much faster.

So what is the best attitudinal measure? Most likely, it’s something that measures the emotional bond one feels for the product, something that asks: is this a brand you prefer above all others? When this bond breaks, future behavior is likely to change.

A final word of caution—you don’t need to include everything that moves. As your mentor used to say, keep it simple, stupid (KISS). Or better yet, keep it stupid simple—senior management will get that.

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Watch our recent webinar to learn about the decision-focused emotional measurement approach we call EMPACT℠: Emotional Impact Analysis. Put away the brain scans and learn how we use emotion to inform a range of business challenges, including marketing, customer experience, customer loyalty, and product development.

WATCH HERE


Topics: advanced analytics, NPS, Dear Dr. Jay

Dear Dr. Jay: Predictive Analytics

Posted by Dr. Jay Weiner

Mon, Apr 27, 2015

ddj investigates

Dear Dr. Jay, 

What’s hot in market research?

-Steve W., Chicago

 

Dear Steve, 

We’re two months into my column, and you’ve already asked one of my least favorite questions. But, I will give you some credit—you’re not the only one asking such questions. In a recent discussion on LinkedIn, Ray Poynter asked folks to anticipate the key MR buzzwords for 2015. Top picks included “wearables” and “passive data.” While these are certainly topics worthy of conversation, I was surprised Predictive Analytics (and Big Data), didn’t get more hits from the MR community. My theory: even though the MR community has been modeling data for years, we often don’t have the luxury of getting all the data that might prove useful to the analysis. It’s often clients who are drowning in a sea of information—not researchers.

On another trending LinkedIn post, Edward Appleton asked whether “80% Insights Understanding” is increasingly "good enough.” Here’s another place where Predictive Analytics may provide answers. Simply put, Predictive Analytics lets us predict the future based on a set of known conditions. For example, if we were able to improve our order processing time from 48 hours to 24 hours, Predictive Analytics could tell us the impact that would have on our customer satisfaction ratings and repeat purchases. Another example using non-survey data is predicting concept success using GRP buying data.


What do you need to perform this task? predictive analytics2

  • We need a dependent variable we would like to predict. This could be loyalty, likelihood to recommend, likelihood to redeem an offer, etc.
  • We need a set of variables that we believe influences this measure (independent variables). These might be factors that are controlled by the company, market factors, and other environmental conditions.
  • Next, we need a data set that has all of this information. This could be data you already have in house, secondary data, data we help you collect, or some combination of these sources of data.
  • Once we have an idea of the data we have and the data we need, the challenge becomes aggregating the information into a single database for analysis. One key challenge in integrating information across disparate sources of data is figuring out how to create unique rows of data for use in model building. We may need a database wizard to help merge multiple data sources that we deem useful to modeling.  This is probably the step in the process that requires the most time and effort. For example, we might have 20 years’ worth of concept measures and the GRP buys for each product launched. We can’t assign the GRPs for each concept to each respondent in the concept test. If we did, there wouldn’t be much variation in the data for a model. The observation level becomes a concept. We then aggregate the individual level responses across each concept and then append the GRP data. Now the challenge becomes one of the number of observations in the data set we’re analyzing.
  • Lastly, we need a smart analyst armed with the right statistical tools. Two tools we find useful for predictive analytics are Bayesian networks and TreeNet. Both tools are useful for different types of attributes. More often than not, we find the data sets comprised of scale data, ordinal data, and categorical data. It’s important to choose a tool that is capable of working with this type of information

The truth is, we’re always looking for the best (fastest, most accurate, useful, etc.) way to solve client challenges—whether they’re “new” or not. 

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: advanced analytics, big data, Dear Dr. Jay, passive data

Dear Dr. Jay: Mining Big Data

Posted by Dr. Jay Weiner

Tue, Mar 17, 2015

Dear Dr. Jay,

We’ve been testing new concepts for years. The magic score to move forward in the new product development process is a 40% top 2 box score to purchase intent on a 5 point scale. How do I know if 40% is still a good benchmark? Are there any other measures that might be useful in predicting success?

-Normatively Challenged

 

DrJay Thinking withGoateeDear Norm,

I have some good news—you may have a big data mining challenge. Situations like yours are why I always ask our clients two questions: (1) what do you already know about this problem, and (2) what information do you have in-house that might shed some light on a solution? You say you’ve been testing concepts for years.  Do you have a database of concepts already set up? If not, can you easily get access to your concept scores?

Look back on all of the concepts you have ever tested, and try to understand what makes for a successful idea. In addition to all the traditional concept test measures like purchase intent, believability, and uniqueness, you can also append marketing spend, distribution measures, and perhaps even social media trend data. You might even want to include economic condition information like the rate of inflation, the prime rate of interest, and the average DOW stock index. While many of these appended variables might be outside of your control, they may serve to help you understand what might happen if you launch a new product under various market conditions.

Take heart Norm, you are most definitely not alone. In fact, I recently attended a presentation on Big Data hosted by the Association of Management Consulting Firms. There, Steve Sashihara, CEO of Princeton Consultants, suggested there are four key stages for integrating big data into practice. The first stage is to monitor the market. At CMB, we typically rely on dashboards to show what is happening. The second stage is to analyze the data. Are you improving, getting worse, or just holding your own? However, only going this far with the data doesn’t really provide any insight into what to do. To take it to the next level, you need enter the third stage: building predictive models that forecast what might happen if you make changes to any of the factors that impact the results. The true value to your organization is really in the fourth stage of the process—recommending action. The tools that build models have become increasingly powerful in the past few years. The computing power now permits you to model millions of combinations to determine the optimal outcomes from all possible executions.

In my experience, there are usually many attributes that can be improved to optimize your key performance measure. In modeling, you’re looking for the attributes with the largest impact and the cost associated with implementing those changes to your offer. It’s possible that the second best improvement plan might only cost a small percentage of the best option. If you’re in the business of providing cellular device coverage, why build more towers if fixing your customer service would improve your retention almost as much?

Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here.

Dr. Jay Weiner is CMB’s senior methodologist and VP of Advanced Analytics. Jay earned his Ph.D. in Marketing/Research from the University of Texas at Arlington and regularly publishes and presents on topics, including conjoint, choice, and pricing.

Topics: advanced analytics, product development, big data, Dear Dr. Jay