headertext headertext promolink

Debunking the Argument of “Accuracy”

Size of Big Data

With the advent of Big Data, it is estimated that 70%-80% of all data collected and stored by an enterprise is in an unstructured form. There are various approaches, technologies and methods to automate the analysis of unstructured data such as text.

However, regardless of advances in technology, some Customer Experience Management, Marketing and Customer Service professionals continue to use the accuracy argument to deny their employers access to significant operational and financial benefits.  They argue that the results, produced by the textual analysis software products, are substantially less accurate than results produced by humans, and therefore it is best to ignore the vast repositories of human knowledge and disregard the immense cost of storing them until the technologies mature.

It is humorous that people with such attachment to “accuracy” usually have difficulty clearly defining what it means to them in this context or how to measure it.

Accuracy and Precision

“In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results”

Accuracy Standards

Given the ambiguous nature of unstructured data, the challenges of formal definition are easy to understand. In its core we deal with an interpretation by a human or by a machine of what was said or written by another human. A single individual will interpret the same text with different results depending on a multitude of conditions, such as time of day, context in which the text is framed or the state of mind of the interpreter at that moment. In addition, no single individual can possibly handle the volumes of data available – and with each additional interpreter joining the task, the reproducibility of translation results declines exponentially.

The speed and cost are obvious arguments for the automated processing, but a machine also offers a better solution to the problem of the “accuracy” of big, unstructured data analysis.  An interpretation of a single piece of text may not agree with an interpretation of a detractor at a given moment, but an average result of a large data set analysis will consistently produce measurements within 10% of a human tester’s results*.

The debate isn’t whether or not automated analysis of unstructured data is “accurate” enough. The debate is whether an enterprise can ignore their vast data reserves in the Age of the Social Consumer.

2013-04-18_120438


* This number is based on our internal tests that we conduct at least 3 times per year.

Comments & Thoughts

  1. This same logic applies across the board. If you’re going to build a propensity model to predict customer behavior, you’re going to have to transform your unstructured data into structured, numeric extracts. That’s what the vast majority of analytic algorithms require. An argument can be made that extracting structured information from an unstructured source is a form of analysis itself. However, my point is simply that the final analysis, which is what started the process of acquiring the unstructured data to begin with, does not use the unstructured data. It uses the structured information that has been extracted from it. This is an important nuance.

  2. Cheyserr says:

    Researches regarding customer engagement is sky rocketing providing the industry great volume of data regarding customer behavior. Unfortunately, no matter how big the data is if industries and brands don’t know how to interpret and put those data’s into action, it will still be useless.

Leave a Reply

Please note that your email will not be published, and is only used for authentication purposes.

XHTML: You may use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>