Docu Builder
An OCR and Document Building Service for a Chemicals Exporter
Computer Vision
About The Company
Elchemy, founded during the COVID-19 pandemic, revolutionizes the chemical industry by connecting global buyers with Indian manufacturers. The company addresses supply chain disruptions, partners with numerous Indian chemical manufacturers, and offers economies of scale on material pricing. With access to over 25 contract manufacturing sites and labs, Elchemy ensures reliability in complex chemical reactions. The team, consisting of experienced chemists and engineers, and founders from IIT-Bombay, IIT-Delhi, and IIM-Ahmedabad, positions Elchemy as a trusted sourcing partner for the chemical industry.
Problem Statement
The customer had to take away 2 hours of human bandwidth from business development function and perform document conversion and formatting manually. Along with the time investment, manual conversion was also prone to around 10% errors, while the acceptable threshold was 1%.
Key Challenges
Each manufacturer has their own documents created on different tools as a result of which there is no standardization in the document formatting. A significant percentage of the documents are photographed or scanned (blurry images). A major challenge was to retain the structure in the document (table, images, forms, texts). The use of LLMs enable us to fill in missing information and reason about ambiguous table and form structures.
Approach
We designed a Django web-app with upload, edit and preview pages. On the upload page, the user chooses the document type. On edit, we provide a side-by-side window to refer to the uploaded file on the left and edit the content extracted on the right. The user can verify the formatting and proceed to generate the final document with company’s letterhead attached. Footer and header removal is a crucial step and is done automatically for most documents. As a fallback, the user can identify the header and footer position through clicks. Both accuracy and standardization of format were crucial for the project. For accuracy, we used Amazon Textract as the OCR service whose output is passed to OpenAI GPT-4 to standardize the content in pre-defined templates.
Tailored AI Branding

Transform your operations, insights, and customer experiences with AI.

Ready to take the leap?

Get In Touch