A look at the emotions, or lack of, the market research industry is generating on social media.
By Michalis A. Michael
Whenever something unexpected happens in life, some of us pause, we look up and we try to figure it out. Especially if you are curious – like good researchers are supposed to be – discovering a paradox can lead to a hypothesis and looking to prove it can be real fun. When we came across the Coca-Cola Superbowl ad paradox, we investigated and analysed it to death. We then started thinking about similar cases that may support our hypothesis i.e negativity brought out by a relative minority, under certain circumstances, may have a positive effect through awakening the majority.
On February 2nd 2014 (the day of the Superbowl ), Coca-Cola aired an ad where people of various ethnicities were singing ‘America the Beautiful’ in their own language. Immediately after the airing of the ad, all hell broke loose on Twitter and other social media platforms; this continued for the following days and weeks.
According to DigitalMR’s findings, during the 8 days prior to the Superbowl there were 139,997 posts about Coca-Cola in the English language: 22% Negative, 7% Positive, and 71% Neutral; during the 8 days following the airing of the ad, the number increased by 169% to 376,382 posts. The interesting finding here is that although the number of posts increased by 169% after the campaign airing, the amount of negative posts still accounted for 22% of the total while positive posts jumped to 51%.
The number of neutral comments is the same before and after at ~100,000; the only difference is its share had dropped to 1/3 of what it was, from 71% to 27%.
What we discovered had actually happened was almost unbelievable, at least for it to have occurred organically; apparently, the initial negative reaction from some consumers had the opposite effect than what they were trying to achieve. Instead of influencing others in preventing them from buying Coca-Cola products and encouraging them to go against the company, they managed to “wake up” many passive consumers that watched the Coca-Cola ad to come to its defense.
Reading through a large number of posts, we saw that although many were assigned a negative sentiment by the listening247 algorithms, they were not negative towards Coca-Cola or the ad, but rather towards the posts attacking the ad and by extension, the brand.
These people not only defended the message behind the brand’s multilingual ad celebrating diversity, they went as far as to directly attack the negative and often racist comments and the people behind them. Through this process of manually going through posts, we identified a new sentiment: the double negative, or indirectly positive.
You see, just like in mathematics, in our view the double negative can be considered as positive. Once we identified the posts that were directly positive about the campaign, as well as the posts that were negative towards the negative comments i.e. the comments attacking the racists and thus defending Coca-Cola (indirectly positive), the results were totally different. The “neg-neg” plus the positive accounted for the largest percentage of posts in total.
As you can see, the directly and indirectly positive posts are overtaking the negative and neutral ones, especially on blogs.
Good social media listening tools can help a brand investigate the market and understand its needs and beliefs in order to predict the negative and double negative reactions, and avoid unwanted social media situations.
Now is not the time for organisations to be asking themselves whether they should be monitoring and analysing social media, the right questions should be: “who should be the owner of Social Analytics, do we currently have someone with the right skill-set, what tools should we be using, and how can we help elevate it to the higher executive levels in order to ensure success?”.
Michalis A. Michael & Sophia Papagregoriou (2014) The Positive Effect of Negativity [Online] Available at: http://cdn2.hubspot.net/hub/186045/file-1371807469-pdf/docs/DigitalMR_The_Positive_Effect_of_Negativity_v0.pdf
In the latest installment of our series on social listening, Michalis Michael looks at the nuances of measuring social sentiment.
By Michalis Michael
Before we start sharing real data and case studies from social listening, we thought it prudent to explain how this is properly done for market research purposes. The most important notion to be understood and accepted is that unlike the way DIY social media monitoring tools work, in order to achieve the required data accuracies that are of paramount importance for market research purposes, a 3-4 week human-led set-up phase is needed (see Fig.1). Once the set-up of a product category or any other subject in a specific language is done, from then on it is possible to take a real-time DIY approach like with any other social media monitoring tool.
Figure1: Social Media Listening & Analytics Process for MR (Source: DigitalMR Presentation at LT-Accelerate Conference 2015)
In the first article of this series, we mentioned that it is important for insights experts to be able to connect the dots between listening, asking questions, and tracking behaviour. In order to do that, an insights expert needs to trust that the thousands of posts analysed are actually about the brands and product category of interest. This brings us to the first of 3 issues to pay attention to when using social listening for market research purposes.
- Noise Elimination
The set of keywords that is used to collect posts from social media and other public websites is called a “harvest query”. This harvest query can be as simple as one word or as complex as multiple pages of Boolean logic. The problem with harvesting only the relevant posts is that we need to also know all of the irrelevant synonyms and homonyms of our keywords; which we never do. Thus, an iterative process is required, involving humans who can improve the harvest query as they find new irrelevant words that they did not think of during the previous iteration. The most common example we use to make this clear is this: when we want to harvest posts about “Apple computers” we know from the beginning that there will be posts about apple the fruit, so we create a harvest query that excludes posts about the fruit; but what about (the actress) Gwyneth Paltrow’s daughter named Apple that everybody talks about on Twitter? I’m sure you see my point…
- Sentiment Accuracy
There are quite a few ways to annotate posts with sentiment, ranging from manual to using linguistic or statistical methods of NLP (Natural Language Processing). There are pros and cons for each method especially when we are looking at a data set with 10,000 posts or fewer that will be used for a one-off report. However, for any continuous reporting or even a one-off report with over 20,000 posts, using humans as opposed to machines is both expensive and slow. In the previous article of this series we talked about the proper metrics for accuracy: Precision & Recall. Most social media monitoring tools can barely achieve a sentiment precision of 60%; as a matter of fact in all cases when we were asked to check, their accuracy ranged between 44%-53%. Anything over 70% sentiment precision could be acceptable at the beginning of a tracking project for market research, but then it should climb over 80% within a short period of time.
- Semantic Accuracy
When we say semantic analysis we mean analysing the topics of online conversation around the product category of interest. Similarly to sentiment accuracy, precision & recall are the appropriate metrics to measure semantic accuracy. If a hierarchical taxonomy (a dictionary with multiple layers/hierarchies that describes a product category with the words people use in their social media posts) is used to report on topics for market research purposes, over 85% semantic precision for hierarchy 1 topics is achievable. You will have noticed that even though we mention “recall” as one of the accuracy measures, in this big data analytics space we have not used it to describe what is appropriate for market research purposes. Recall for semantic analysis is about how many of the posts that actually exist in a data set on a certain topic, were identified as such. In the world of big data where we deal with millions of posts, in order to be cost efficient, it is in our interest to only look at keywords that are mentioned multiple times. If a keyword is mentioned only a couple of times in a data set of millions we will be entering in “diminishing returns territory” if we attempt to annotate posts with it. It is however possible to maximise recall and it should be the end-client’s decision if they want to spend their money and time this way.
Let me know if you have questions; in the meantime, stay tuned for the next article which will be about some real data from social listening.