PDF OCR β Extract Text from Scanned PDF
Use OCR to recognize and extract text from scanned PDF documents in your browser.
Drop files here or click to upload
Max 100 MB
Recognize text in scanned PDF documents using optical character recognition powered by Tesseract.js. Choose your document language, then export extracted text or generate a searchable PDF with an invisible text layer.
Last reviewed: June 2026
How to use this tool
- 1Upload a scanned PDF document.
- 2Select the document language from the dropdown.
- 3Choose output format: plain text or searchable PDF.
- 4Click Process to start OCR β progress shows per page.
Common use cases
- Make scanned contracts searchable for specific clauses.
- Digitize paper archives into text for indexing and search.
- Extract text from image-based PDFs that have no text layer.
Technical notes
- Uses Tesseract.js WASM engine running entirely in your browser.
- Language data is downloaded on first use (~4-50MB depending on language) and cached by your browser.
- Pages are rendered at 2x scale for better recognition accuracy.
Private by design
This tool runs in your browser. Your file is not uploaded to our server while using the tool.
Limitations
- Handwritten text recognition accuracy is significantly lower than printed text.
- Complex layouts with multiple columns or tables may produce disordered text.
- First-time language pack download requires an internet connection.
Frequently Asked Questions
Which languages are supported?
Over 100 languages are available including English, Chinese (Simplified and Traditional), Japanese, Spanish, French, German, and Korean. Select from the dropdown before processing.
Why is the first run slower?
The language recognition data must be downloaded on first use (4-50MB depending on language). After that, your browser caches it for faster subsequent runs.
How can I improve OCR accuracy?
Use high-resolution scans (300 DPI or higher), ensure the document is not skewed, and choose the correct language.
What is a 'searchable PDF'?
A searchable PDF contains an invisible text layer on top of the original scanned image. You can use Ctrl+F to find text while the visual appearance stays the same.
Is my scanned document uploaded anywhere?
No. OCR processing runs entirely in your browser using WebAssembly. Your document never leaves your device.