Skip to content
All projects
SmolDocling-OCR: Lightweight Document OCR Pipeline
Machine Learningcompleted

SmolDocling-OCR: Lightweight Document OCR Pipeline

Nov 20233 months

Overview

SmolDocling-OCR provides a streamlined workflow for converting scanned documents and images into machine-readable text. The system leverages Tesseract for OCR, OpenCV for image pre-processing, and a simple web interface for uploading images and viewing results. Designed for researchers and students, it emphasizes accuracy, speed, and ease of deployment.

Tech Stack

frontend
React
backend
PythonFlask
other
Tesseract OCROpenCVPillow

Challenges

  • Handling diverse document qualities and layouts (skew, noise, handwriting).
  • Optimizing pre-processing to improve OCR accuracy across languages.
  • Building a simple, user-friendly web interface for non-technical users.
  • Ensuring lightweight deployment suitable for resource-constrained environments.

Solution

The pipeline integrates image enhancement with OpenCV and Pillow, followed by text extraction using Tesseract OCR. A Flask backend processes uploads and OCR tasks, while a React frontend allows users to submit documents and view extracted text. The system is containerized for easy deployment.

Outcome

SmolDocling-OCR enables rapid and reliable text extraction from various educational and research documents. Its lightweight design allows deployment on low-resource hardware, and its modular structure supports easy customization for different research needs.

Built with

PythonTesseract OCROpenCVPillowFlaskReact