Extracting investor critical information from Employee Reviews
Natural Language Processing
Generative AI
About the Company
Founded in 2022 with offices in London, Singapore, and Mumbai, Kelp is a FinTech solution that uniquely combines predictive technology with comprehensive private company data to support alternative investments from inception to exit. The platform offers an end-to-end product with three core modules: Deal Identification, Deal Management, and Value Creation. Fully modular, Kelp enhances organizational intelligence and automates routine tasks for efficient decision-making. Its prescriptive decision engine empowers investment professionals with transformed data, provides actionable insights, and streamlines workflows for effectiveness and continuous learning.
Problem Statement
The customer had a proprietary database of 2M+ Employee reviews . Out of these, only a few reviews provide investor-critical data like misdoings or bias within the company. It would have taken 22000+ man hours to manually identify each critical review, and classify it correctly.
Key Challenges
The employees can potentially be talking about endless workplace topics in their review. Using a regular classification algorithm on these with high accuracy is close to impossible. On the other hand, LLMs offer high accuracy if we group topics and bring down the number of classes to ~20, but are extremely costly. Hence, data cannot be fed into an LLM blindly, and must undergo processing and compression to get high accuracy in a cost effective manner.
Solution Approach
We explored several algorithms to identify data-backed critical reviews, including KNN with BERT embeddings, automated topic generation using topicGPT, sentence embeddings with Open AI, etc. However, none of these could handle the large number of classes and classify reviews with high accuracy.
We then explored using GPT 4 along with prompt engineering, and were satisfied with the accuracy. However, there was a concern regarding the cost and latency. We tackled this using automated data cleaning, and disregarding data where there is less likelihood of finding critical reviews. This helped bring down costs to a manageable level. Additionally, the costs would continue to reduce with newer, cheaper, and faster LLMs by simply swapping the model after evaluating results on the test data set.
Transform your operations, insights, and customer experiences with AI.