Kelp Employee Review
An algorithm to extract critical information about a company from Employee Reviews
Natural Language Processing
Generative AI
About the Company
Founded in 2022 with offices in London, Singapore, and Mumbai, Kelp is a FinTech solution that uniquely combines predictive technology with comprehensive private company data to support alternative investments from inception to exit. The platform offers an end-to-end product with three core modules: Deal Identification, Deal Management, and Value Creation. Fully modular, Kelp enhances organizational intelligence and automates routine tasks for efficient decision-making. Its prescriptive decision engine empowers investment professionals with transformed data, provides actionable insights, and streamlines workflows for effectiveness and continuous learning.
Problem Statement
The customer had a database of 2M+ Employee reviews scraped from websites like Glassdoor, Indeed, etc. Out of these, only a few reviews that included data-backed allegations against the company policies, management or leadership behavior, or other critical issues added value for a prospective investor looking to invest in the company. It would have taken 22000+ man hours to manually identify each critical review, and classify it correctly.

Key Challenges
The number of topics that employees mention in their reviews are 1000+. Using a regular classification algorithm on these with high accuracy is close to impossible. On the other hand, LLMs offer high accuracy if we group topics and bring down the number of classes to ~20, but is extremely costly. Hence, data cannot be fed into an LLM blindly, and must undergo processing and compression to get high accuracy in a cost effective manner.
Approach
We explored several approaches to identify and classify reviews containing critical information, or critical reviews.
[1] Initially, we used KNN with BERT embeddings to cluster reviews, but the results were unsatisfactory, with outliers often being long, misspelled, or written in Romanized Hindi.
[2] We then experimented with automatic class generation using BERTopic and topicGPT. BERTopic produced non-interpretable topics, while topicGPT generated over 100 interpretable but numerous topics. Manual classification confirmed the large number of distinct topics, complicating the creation of a manageable classification system.
[3] Shifting focus to critical classes, we collaborated with Surya and Abhishek to define a list of critical categories. GPT-4 demonstrated high accuracy in extracting reviews pertaining to these classes, but its cost was a concern. Attempts to use cheaper models like GPT-3.5 and LLAMA resulted in lower accuracy.
[4] Finally, we tried using sentence embeddings with OpenAI to identify critical reviews based on representative samples. However, this approach underperformed compared to GPT-3.5.
Overall, while GPT-4 showed promise in classifying critical reviews, although, cost and the challenge of managing a large number of topics remained significant hurdles.
To address the cost and challenge of processing a high number of reviews, we focused on reducing the number of tokens passed to GPT-4 for analysis. We removed duplicate reviews and those with less than six words, as shorter reviews tended to be less insightful. We also experimented with POS tagging techniques using NLTK and the Stanford Parser to filter reviews based on parts of speech, but this approach had low accuracy and a capitalization bias.
Ultimately, we found that the most effective method was to use all negative reviews for identifying critical insights and a separate prompt for extracting the top five topics from both positive and negative reviews, along with summaries, counts, and representative reviews.

Tailored AI Branding

Transform your operations, insights, and customer experiences with AI.

Ready to take the leap?

Get In Touch