Text Analysis: making computers understand human language



The explosive growth of textual data in the business world leads companies to be interfaced to an increasing amount of data: just think of all the emails, web pages, posts on social media, documents, comments and customer reviews generated every day.

It is therefore increasingly necessary to summarize, analyze, shape and interpret such data, in order to extract valuable and useful information to the so-called decision making.

What is meant by "Text Analysis"?

“Text Analysis” means the technique of extracting reports, diagrams and information from unstructured or semi-structured textual sources, using ML techniques (Machine Learning, the branch of AI that uses analytical and statistical techniques to use large amounts of data without human intervention) and NLP (Natural Language Processing). The latter represents the branch of Artificial Intelligence that, using technologies useful to make computers capable of transposing, manipulating and interpreting human language, aims to bridge the communication gap between man and machine.

The adoption of this type of analysis therefore helps companies to collect data from multiple sources (social media, surveys, websites, emails, feedback tools…). These data are subsequently analyzed and processed using ML and NLP techniques in order to extract valuable insights that, if combined with specific visualization tools,  can be made visible and understood at a glance, thus offering potential support for multiple business activities (user opinion analysis, organisational changes, faster decision-making, anticipation and/or identification of possible problems…).


Benefits of Text Analytics

The adoption of Text Analytics within an enterprise allows to:

  • Increase the customer experience: knowing the user’s propensities, the quality of the services or the performance of specific products helps to identify weaknesses to be perfected, qualities to be validated or common problems. Improving customer satisfaction will result in fewer abandonments and greater loyalty and profit.
  • Speed up and facilitate decision-making: manually analyzing textual information typically involves high investigation times, significantly reduced in the case of automated analysis, which, as a result, would lead to significantly faster decision-making processes.
  • Offer more targeted products and/or services: text analytics is profitable in obtaining essential information in market analysis, brand positioning and marketing, facilitating not only the understanding of what influences and motivates customers’ buying decision, but also a deeper knowledge of the competition and the application sector.

Text Analytics chaellenges

Depending on the extraction source, the input data can be categorized into:

  • Structured data: data formatted in well-defined schematics. An example is data organized in well defined rows or columns. As the architecture of such data is well established, these represent the easiest type of data to analyze.
  • Unstructured data: data that is not organized in defined ways and without an easily identifiable structure. Social media content, chats, documents or surveys fall into this category. The absence of a distinct skeleton makes this data rather complex to handle.
  • Semi-structured data: data without a rigid structure, but characterized by descriptive elements (such as tags or metadata) thanks to which they can be catalogued and analyzed in a more functional way than nonstructured data. This type includes JSON, CSV, or XML files.

Techniques and tools of Text Analysis

AI (Artificial Intelligence) plays an indisputably fundamental role in textual analysis; its interpretation is associated with the ability of computers to perform tasks commonly attributed to human intelligence. In the field of Text Analysis, it presents itself as NLP (Natural Language Processing) and ML (Machine Learning).

Used both separately and simultaneously, these two branches offer the means necessary to perform the main phases of textual analysis, which can be schematized in:

Data extraction

Data relevant to a particular company, product, brand or service can be extracted from internal sources ( company information acquired by email, surveys…) or external sources (information acquired by social media, newspaper news, online reviews, forums…).

Data pre-processing

This step is fundamental for the analysis: once the unstructured or semi-structured data has been collected, a specific preparation is necessary for them to be suitable for a subsequent Machine Learning model.

The texts will then be cleaned (for example, removing punctuation and words of negligible value) and transformed into lists of numbers (vectors), so you can provide future models with the necessary numerical inputs.

Model deployment

Once processed, data can be “fed” to Machine Learning models.

Depending on the objective, these models may be of the following type:

  • Supervised: in this case, the input data must necessarily be labelled (for each data, the characteristic that will be the objective of the model must be explained; for example, in a Sentiment Analysis model where it is necessary to classify a review – that is, each element of the training – as “positive”, “negative” or “neutral”, each review must be associated with a label endorsed as one of the 3 feelings to be identified). These labelled data will act partly as training (data on which the model will be trained), and the rest will be tested (data on which the performance of the trained model will be tested).
  • Unsupervised: Unlike the supervised model, input data here will not be labeled. Without any information regarding the class of belonging, the model must therefore be able to recognize patterns, relationships and similarities existing in the data themselves, grouping them according to these similarities.

This implies a lack of real training.

A supervised model will tend to have greater accuracy, at the expense of generally higher complexity due to the training that, in this case, is necessary.

Either way, once models are ready, they will be able to recognize the desired output of new unlabeled data.

Data visualization

After unstructured data has been processed using Text Analytics techniques and Machine Learning models, the resulting information can be used for illustrative purposes in the form of graphs, dashboards, wordclouds or diagrams, providing visual cues that allow companies to efficiently identify trends in data and make decisions.

Text Analysis: business benefits and use cases

In order to extract meaningful information from texts, in the world of business there are many applications of Text Analysis. Among them, we can find:

  • Sentiment Analysis: oriented to the extraction of subjective information such as emotions or feelings. By collecting data from social media posts (e.g., comments from posts that promote certain services), surveys, product reviews, or other sources, negative, positive or neutral feelings that are predominant, triggered by specific phenomena or circumstances can be identified.This analysis is advantageous to understand the propensity of consumers towards particular brands, the overall responses to products or services, as well as the trend over time (possibly in response to events such as advertising campaigns or promotions) or identify new trends among users.
  • Topic Modelling: non supervised learning technique that allows to distinguish topics covered in a collection of texts, assigning each text to a specific topic. By extracting common keywords and concepts, this analysis allows you to group texts or documents that are not labelled a priori, clustering the text contents under consideration and providing an informative summary. From this analysis comes the benefit of identifying, scanning, analyzing and visualizing plausible topics based on a large amount of texts, which would be considerably problematic to examine manually.

Blue BI, who has always believed in the value of data, works with Text Analysis solutions to help its customers take full advantage of the huge amount of information they have. Analyzing data to identify relevant information allows you to identify new opportunities and accelerate growth through informed decisions.

If you want to know how we can help your company generate value from the data you have, contact us!

We realize Business Intelligence & Advanced Analytics solutions to transform simple data into information of freat strategic value.


Table of Contents