deepseek ocr2

Document intelligence engine

OCR for real-world documents

deepseek ocr2: OCR that stays accurate in production

Keep accuracy high across messy scans, mixed formats, and multi-language documents.

Output structured fields and layout so your workflow can rely on the result.

Try deepseek ocr2 View OCR2 API

Multi-language OCRTable extractionLayout-aware output

99.2%

Character accuracy

180ms

Avg. latency

50+

Document types

OCR2 console

Live

Recognized text

INVOICE 2471 · TOTAL: $4,820.00 · DUE: 2025-05-24

Confidence scores and bounding boxes are auto-generated.

Layout reconstruction

12 columns · 4 sections

Headers, columns, and tables stay aligned.

Export formats

CSV / JSON / PDF

JSON / CSV / searchable PDF

Core capabilities

What deepseek ocr2 delivers

Six OCR pillars tuned for speed, accuracy, and production reliability.

Robust text capture

Handles blur, rotation, and low-contrast scans with steady output.

Layout-aware parsing

Preserves reading order and position across complex layouts.

Tables & forms

Extracts cells and key-value pairs without templates.

Quality signals

Confidence scores, bounding boxes, and checksum validation.

Safety & audit

Review-ready logs and redaction hooks for compliance.

Elastic deployment

Scale on cloud, hybrid, or private environments.

Workflow

From scan to structured data in minutes

Ingest

Upload PDFs, images, or scans via API or batch sync.

Detect & parse

OCR2 reads text, tables, and layout in one pass.

Validate

Check confidence scores and apply domain rules.

Export

Ship JSON, CSV, or searchable PDF to your pipeline.

Performance

Accuracy without slowing you down

Fast inference with deterministic layout even under heavy loads.

97.4%

Table accuracy

Benchmarked on complex invoices and reports.

30+

Multi-language

Consistent OCR for mixed scripts.

12k pages/hr

Batch throughput

Measured on standard A4 scans.

Research notes

Paper-driven read on OCR2

From the paper, OCR2 centers on causal visual flow, token reordering, and document-first evaluation.

DeepEncoder V2 swaps CLIP for an LLM-style encoder and adds causal flow queries.

Visual tokens keep bidirectional attention; causal queries are causal, and only their outputs go to the decoder.

The paper reports a 3.73% overall gain on OmniDocBench v1.5 over DeepSeek-OCR.

Visual token budget is constrained between 256 and 1120 for cost vs fidelity.

Training uses ~80% OCR data with three stages: pretraining, query enhancement, decoder specialization.

Dynamic resolution inference supports mixed-size documents and scans.

OmniDocBench signals

TextEdit

Edit distance for text correctness.

Formula CDM

Formula consistency metric.

Table TEDS

Table structure similarity score.

R-order Edit

Reading order edit distance.

Prompt modes

Layout-preserving

<image>\n<|grounding|> Convert the document to markdown.

Plain OCR

<image>\nFree OCR.

Paper PDF Hugging Face Model Card GitHub Repo

Use cases

Where deepseek ocr2 works best

Built for finance, logistics, customer support, and knowledge teams.

Invoice automation

Extract totals, line items, and tax fields reliably.

Compliance archives

Search and audit long-tail documents with confidence.

Shipping & logistics

Normalize manifests, customs docs, and labels.

Knowledge digitization

Turn scanned manuals into searchable references.

FAQ

Common questions

Can deepseek ocr2 handle mixed languages?

Yes. OCR2 detects and normalizes mixed scripts automatically.

Do you support private deployment?

Private and hybrid options are available with audit controls.

How do we evaluate accuracy?

Start with real documents and compare field-level precision.

What file types are supported?

Images, PDFs, scans, and multi-page documents.

Ready to ship

Start with deepseek ocr2 today

Move from scans to structured data with confidence.

Request access See integration guide