Headline Impact
99%
Classification Accuracy on 70K+ Documents in 3 Days
MSME SaaS
OCR
Document Classification
99%
Classification accuracy across 15+ document types
70K+
Untagged documents classified within a 3-day deadline
0.01s
Per-document processing — down from >1 second
The Client
Khatabook — India's Leading MSME Business Platform
Khatabook — India's leading business management platform for MSMEs, needing to classify 70,000+ untagged documents across 15+ types for government compliance submission — with only 3 days to do it.
The Challenge
70,000+ Documents, 3 Days, Zero Room for Error
70,000+ documents needed classification across 15+ types within a 3-day deadline. Manual tagging would have required 250+ human hours and a team of 15+ workers — an impossibility given the timeline. Standard PDF processing tools took over 1 second per document, far too slow for the volume. The documents varied wildly in format, quality, and structure.
What We Built
High-Speed Document Classification Engine
1. Optimized PDF Pipeline
Replaced OpenCV/pdftoimage (>1s per doc) with mutool, achieving 0.01 seconds per PDF — a 100x speed improvement.
2. Feature-Based Classification
Extracted distinguishing characteristics — logos, whitespace, barcodes, QR codes, colors, headers, footers — and applied heuristic-based classification logic.
3. Image Optimization
Reduced resolution for native image files to meet the 0.05-second-per-image processing target without sacrificing classification accuracy.
4. QC Annotation Tool
Built a manual validation interface for flagging ambiguous cases, ensuring accuracy on edge cases.
Technology
Powered By
Mutool PDF Processing
Feature Extraction
Heuristic Classification
Image Optimization
QC Annotation Interface
The Results
99% Accuracy in 3 Days — 100x Faster Processing
Achieved 99% accuracy on 70,000+ documents within the 3-day deadline. Eliminated the need for 250+ manual hours. Processing speed improved 100x — from over 1 second to 0.01 seconds per document.
"We had 3 days and 70,000+ documents. Manual tagging would have taken 250+ hours with 15+ people. The AI did it at 99% accuracy."
— Khatabook
Ready to Transform Your Operations?
We've delivered $100M+ in business impact across IT services, healthcare, HR tech, and fintech.
Book a Scoping Call