Text is No Longer Enough for Text Analytics

By Michalis Michael

According to Twitter, 77% of all tweets about soft drinks do not have a textual reference to a soft drink brand or the product category. More and more people use images and video to express their feelings, or share their point of view and feedback. This is sometimes combined with text. In these cases, only part of the post would be analysed by the popular social media monitoring tools.

Text analytics is, nowadays, rarely enough to get the full story around brands or products on social media and other online locations. More often than not you have to take into account the image or video that comes with the text. Sometimes there isn’t even any text, the post is one or multiple images e.g. on Instagram this happens a lot. Now if only there was a way to turn those images and videos into text to be analysed as normal…

…actually, there is!

A deep learning approach can be used to detect the theme and context of an image and describe it using text, i.e. create a caption. Also, speech to text technology can be used to turn audio into text – when it comes to videos.

Deep learning is a form of machine learning that utilises convolutional neural networks with several layers (over 4 to qualify as “deep”). To conduct image theme analytics with accuracy you’d need at least 20 layers of neural networks, a training data set of at least 100,000 images, and a very powerful computer with lots of RAM and multiple graphic processors. The accuracy becomes a lot better when the deep learning model created is custom for the subject of research as opposed to using a generic model for all kinds of visuals.

Back to the application and usefulness of automated image theme analysis. This approach enables a brand to discover interesting and useful information they would otherwise not have, even if the brand or product name is not explicitly mentioned. The brand shown in this picture would have absolutely no way of actioning this post as the text only refers to the product as “one of these” – unless of course they analyse the image.

Having the ability to essentially turn pixels into text makes it possible to know what an image is about, to cluster it under its respective discussion driver “bucket”, to understand how consumers feel about the brand or product category depicted, and identify usage occasions or discover exclusive insights on the competition.

This technology can be used beyond just marketing; imagine a deep learning model that can identify diseases in ultrasound, MRI or X-ray scans; or being able to scan through millions of satellite images to detect certain structures or activity on our planet.

Putting together:

  • text extraction
  • brand recognition
  • facial recognition, and
  • theme detection…

…to analyse images, automatic speech to text transcription to analyse audio, breaking down videos into images and text to be analysed as described, and then analysing all the above in text format for sentiment, emotions, and topics with high precision offers a whole new world of possibilities for any brand, organisation or person out there!

By Michalis Michael, CEO, DigitalMR