Technical Posts

Unlocking Business Insights with Generative AI Text Analysis

1. The Role of Text Analysis in Modern Enterprises

Companies deal with large volumes of data every day, including both numerical and text-based information. Numerical data is structured data, so it is often easier to analyze due to its well-formed formatting. On the other hand, unstructured text data from various sources, such as emails, sales reports, call center logs, and internal documentation, is especially challenging to analyze. However, it’s also a goldmine of hidden, meaningful insights that can drive business improvements and strategic decision-making, if only we had the right tools.

Natural language processing (NLP) text mining tools have been the go-to for analyzing text data. However, these text analysis software, which primarily relied on statistical properties of words and sentences, had their limitations. They struggled to understand the deeper context and meaning, which is where generative AI steps in. Generative AI has revolutionized text analytics, enabling more sophisticated interpretation and contextual understanding through advanced models like BERT and GPT.

Generative AI has not only unlocked new possibilities for businesses but also transformed the landscape of text mining. It can extract, organize, and interpret complex information in ways traditional NLP could only dream of. This shift has transformed text mining from a purely statistical exercise into a powerful method for deriving deep business insights.

In this post, we explore the power of text analysis, provide real-world examples, and compare traditional NLP techniques to generative AI to highlight their respective benefits and challenges.

2. Business Applications of Text Analysis

Text analysis is driving improvements across multiple business functions. Below are a few critical applications where modern organizations can leverage this technology.

2.1 Analyzing Sales Reports to Identify Success and Failure Factors

Sales teams generate daily reports and activities to analyze customer feedback, interactions, negotiations, and deal progress. Businesses can identify factors contributing to the success or failure of deals by analyzing these reports. For instance:

  • Detecting patterns in successful negotiations by discovering key phrases such as “pricing discussions,” “competitor comparisons,” and “implementation challenges.”
  • Measuring closing rates for deals involving specific terms to refine sales strategies.
  • Identifying winning sales tactics based on industry, company size, or decision-making process.

2.2 Predicting Customer Churn Through Call Center Feedback

Call centers collect vast amounts of customer feedback through complaints, inquiries, and service requests. Text analysis tools can help detect early warning signs of churn in several ways:

  • Extracting common phrases from customers who later canceled their subscriptions (e.g., “difficult to use,” “expensive fees,” “slow support”).
  • Sentiment and co-occurrence analyses assess the severity of negative feedback, enabling proactive interventions to improve customer retention.

2.3 Enhancing Employee Engagement Through HR Data Analysis

Understanding employee sentiment through performance reviews and survey responses is crucial for improving workplace culture. Traditionally, companies relied on numerical rating systems due to the complexity of text analysis. Modern data analysis tools allow for deeper insights:

  • Categorizing qualitative feedback into themes such as “manager relationships,” “career growth,” and “work-life balance.”
  • Quantifying the recurrence of employee concerns to pinpoint areas needing improvement.
  • Identifying departments with high levels of dissatisfaction and developing targeted HR strategies to address them.

2.4 Improving Investor Communications Through IR Document Analysis

Investor relations (IR) materials contain crucial insights influencing market perception and investment decisions. Companies can enhance investor communication strategies through the combination of quantitative and text analytics:

  • Analyzing past IR materials alongside stock price fluctuations to determine the impact of positive and negative language.
  • Comparing competitor disclosures to refine corporate messaging.
  • Structuring financial reports in a way that aligns with investor expectations.

These use cases demonstrate the significant impact of unstructured data analysis across various industries, including quality control in manufacturing, diagnostic support in healthcare, and legal risk assessment. Of course, we have covered only a few specific use cases. There are numerous applications for text analytics, including quality control analysis in manufacturing, diagnostic assistance through the analysis of medical records, and assessing legal risk through the analysis of judicial precedents. The opportunities are nearly endless.

3. Legacy NLP-Based Text Analysis Methods

Traditional approaches to text analytics can be broadly classified into three categories:

N-Gram Analysis

N-Grams break text into sequences of N consecutive words or characters, helping identify frequent patterns. However, this method lacks contextual understanding, which limits its effectiveness for analyzing long-form text.

Topic Modeling

Topic modeling, such as Latent Dirichlet Allocation (LDA), automatically identifies themes within large text datasets. While useful for categorization (e.g., FAQs based on historical customer inquiries), topic interpretation can be ambiguous, and due to its unsupervised nature, it often requires careful parameter tuning and preprocessing to improve accuracy.

Word & Document Embeddings

Techniques like Word2Vec and BERT transform words and documents into numerical vectors, allowing for better semantic similarity analysis. However, these models depend on training data and may not generalize well to domain-specific applications.

4. Generative AI-powered Text Analysis Tool

Traditional NLP approaches primarily rely on statistical properties, often failing to grasp deeper meaning. For example, an n-gram analysis might treat “I visited a client about Product A” and “I intended to visit a client about Product A but couldn’t” as identical, despite their different meanings.

Generative AI for Semantic Feature Extraction

At dotData, we enhance text analysis by leveraging AI to interpret meaning beyond simple textual features. For instance, when analyzing sales reports, instead of merely detecting words, our AI answers context-aware questions like:

  • “Did the salesperson meet with the customer?”
  • “What product was introduced to the client?”

This approach, which leverages generative AI for semantic feature extraction, significantly improves accuracy and interpretability compared to traditional text mining techniques.

Case Study: University Complaint Classification

We tested dotData’s generative AI on university complaint data, categorizing feedback into predefined themes:

  • Dissatisfaction with meals and cafeterias
  • Dissatisfaction with online classes
  • Dissatisfaction with the Student Affairs Division
  • Dissatisfaction with job opportunities
  • Dissatisfaction with activities and events
  • Dissatisfaction about health problems and welfare
  • Other complaints

The process of assigning each complaint to a specific dissatisfaction category was performed using generative AI to classify each instance of dissatisfaction into predefined categories, assigning semantic labels to represent its estimates. To evaluate accuracy, dotData’s generative AI results were compared against human classifications for a randomly selected set of 215 texts, using the human-labeled data as the benchmark.

The table below shows the results of the analysis using four of the most current generative AI models connected to dotData’s text semantic discovery function. Although there are variations between models, the classification results are nearly identical to those performed by humans. (Note that the accuracy of Claude 3 deteriorated significantly to about 70%, indicating the evolution of the latest LLMs.)

Misclassification Examples:

Document 1: “Campus vending machines are often out of order and inconvenient.”

  • Human: “Cafeteria Complaints”
  • AI: “Other Complaints”

Document 2: “I appreciate the diversity on campus, but sometimes I feel there is a lack of understanding between different cultural groups. I wish we had more opportunities to interact and learn from each other.”

  • Human: “Other Complaints”
  • AI: “Activity/Event Complaints”

While AI classification is highly effective, certain subjective cases remain challenging even for human reviewers.

Cost Considerations in Generative AI Implementation

Applying generative AI at scale incurs computational costs. For instance, extracting semantic labels for 100,000 documents (each ~1,000 characters) varies significantly depending on the AI model used. While generative AI offers deep insights, businesses must balance accuracy with cost efficiency in their AI implementation. This understanding will help you make informed decisions about your AI strategy.

dotData’s Advanced Features for Enterprise AI

The semantic feature extraction tool by dotData, powered by generative AI, goes beyond simply assigning semantic labels. It offers advanced capabilities designed for enterprise use, including:

AI-Based Semantic Label Recommendation:

Since the meaning extracted from text depends on context, this feature enables users to input a dataset, and the AI engine automatically analyzes it to recommend relevant semantic labels. This helps users quickly identify key themes and extract valuable insights, even if they are not experienced in text analysis or unfamiliar with the dataset.

Fast and Cost-Effective Evaluation of Meaning Extraction Accuracy:

The accuracy of extracted meanings depends on how well users define them in prompts. Given that generative AI can be resource-intensive, dotData’s AI text analysis tool enables rapid and cost-efficient semantic feature extraction before applying prompts to the entire dataset. It allows users to fine-tune prompts while evaluating label accuracy, ensuring efficient large-scale implementation.

Integration with dotData’s Automated Feature Engineering:

Extracted semantic features can be further processed to generate deeper insights. For instance, identifying customers who have contacted support three or more times regarding service quality issues within a month can be a strong predictor of churn. By incorporating this information into aggregated customer profiles, businesses can enhance predictive analytics and improve the decision-making process.

This combination of AI-powered label recommendations, efficient evaluation, and automated feature engineering makes dotData a powerful tool for enterprises looking to maximize the value of their text data.

5. Conclusion

Building on the examples of text analysis that combine traditional NLP and generative AI discussed in this post, we anticipate continued advancements in large-scale language models, leading to improved performance and lower inference costs. These developments will make it easier for companies to conduct in-depth analysis of their vast text datasets.

As a result, new methods will emerge for extracting advanced and actionable insights that incorporate both context and meaning, complementing conventional NLP techniques. This evolution will unlock innovative ways for businesses to leverage text data effectively.

Looking ahead, companies will be able to maximize the value of their text data, enhance decision-making accuracy, and drive innovation by strategically implementing generative AI. By aligning AI capabilities with business objectives and refining expertise in evaluating meaning extraction accuracy, organizations can unlock new opportunities for data-driven growth.

Yukitaka Kusumura, Ph.D.

Yukitaka is the principal research engineer and a co-founder of dotData, where he leads the R&D of AI-powered feature engineering technology. He has over ten years of experience in research related to data science, including machine learning, natural language processing, and big data engineering. Prior to joining dotData, Yukitaka was a principal researcher at NEC Corporation. He led the invention of cutting-edge technologies related to automated feature engineering from various data sources and worked with clients as a data science practitioner. Yukitaka received his Ph.D. degree in Engineering from Osaka University.

Recent Posts

Agentic AI in Data Analytics – Explore Amazon Q in Amazon QuickSight

Agentic AI in Data Analytics Generative AI is bringing significant transformations across industries, especially in…

2 months ago

dotData Insight: Melding the Power of AI-Driven Insight Discovery & Generative AI

Introduction Today, we announced the launch of dotData Insight, a new platform that leverages an…

1 year ago

Boost Time-Series Modeling with Effective Temporal Feature Engineering – Part 3

Introduction Time-series modeling is a statistical technique used to analyze and predict the patterns and…

2 years ago

Practical Guide for Feature Engineering of Time Series Data

Introduction Time series modeling is one of the most impactful machine learning use cases with…

2 years ago

Maintain Model Robustness: Strategies to Combat Feature Drift in Machine Learning

Introduction Building robust and reliable models in machine learning is of utmost importance for assured…

2 years ago

The Hard Truth about Manual Feature Engineering

The past decade has seen rapid adoption of Artificial Intelligence (AI) and Machine Learning (ML)…

2 years ago