opendatalab/MinerU

📖 README 摘要

[](https://github.com/opendatalab/MinerU)
[](https://github.com/opendatalab/MinerU)
[](https://github.com/opendatalab/MinerU/issues)
[](https://github.com/opendatalab/MinerU/issues)
[](https://pypi.org/project/mineru/)
[](https://pypi.org/project/mineru/)
[](https://pepy.tech/project/mineru)
[](https://pepy.tech/project/mineru)
[](https://mineru.net/OpenSourceTools/Extractor?source=github)
[](https://huggingface.co/spaces/opendatalab/MinerU)
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[](https://arxiv.org/abs/2409.18839)
[](https://arxiv.org/abs/2509.22186)
[](https://arxiv.org/abs/2604.04771)
[](https://deepwiki.com/opendatalab/MinerU)

   

 

English | 简体中文

 

 
🚀 Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in! 
 

 

 
    👋 join us on  Discord  and  WeChat 
 

 

 
 MinerU — High-accuracy document parsing engine for LLM · RAG · Agent workflows 
Converts PDF · DOCX · PPTX · XLSX · Images · Web pages into structured Markdown / JSON · VLM+OCR dual engine · 109 languages  
MCP Server · LangChain / Dify / FastGPT native integration · 10+ domestic AI chip support

**🔍 Core Parsing Capabilities**

- Native support for `DOCX`, `PPTX`, and `XLSX` parsing
- Formulas → LaTeX · Tables → HTML, accurate layout reconstruction
- Supports scanned docs, handwriting, multi-column layouts, cross-page table merging
- Output follows human reading order with automatic header/footer removal
- VLM + OCR dual engine, 109-language OCR recognition

**🔌 Integration**

| Use Case | Solution |
|----------|----------|
| AI Coding Tools | MCP Server — Cursor · Claude Desktop · Windsurf |
| RAG Frameworks | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |
| Development | Python / Go / TypeScript SDK · CLI · REST API · Docker |
| No-Code | mineru.net online · Gradio WebUI · Desktop client |

**🖥️ Deployment (Private · Fully Offline)**

| Inference Backend | Best For |
|------------------|---------|
| pipeline         | Fast & stable, no hallucination, runs on CPU or GPU |
| vlm-engine       | High accuracy, supports vLLM / LMDeploy / mlx ecosystem |
| hybrid-engine    | High accuracy, native text extraction, low hallucination |

Domestic AI chips: Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head
 

# Changelog

- 2026/06/18 3.4 Released

  This release focuses on **OCR capability upgrades for the pipeline backend**, **OCR processing pipeline optimization**, and **model download experience improvements**. The main updates include:

  - OCR model upgrade and processing acceleration
    - The OCR model for the `pipeline` backend has been upgraded to `PP-OCRv6`, improving OCR accuracy by about `11%` on OmniDocBench v1.6.
    - Removed Japanese, Traditional Chinese, English, and Latin options from OCR language selection. These scenarios are now routed to the `ch` OCR model, simplifying model configuration and language selection.
    - Optimized the OCR inference and processing pipeline, increasing OCR processing speed by about `100%` and significantly improving parsing efficiency for batch documents and OCR-intensive documents.

  - Model download logic optimization
    - Added automatic model source selection, allowing first-time installations to choose a better model source based on the current network environment.
    - Before downloading models, MinerU now prioritizes checking locally downloaded model cache files. Cache hits can be reused directly, reducing repeated downloads and unnecessary remote requests.
    - For more details about model source configuration, automatic source selection, and local model usage, see the Model Source Documentation.

  With the 3.4 release, MinerU further improves the parsing accuracy and processing efficiency of the `pipeline` backend in OCR scenarios. It also optimizes model downloads, cache reuse, and local configuration write-back, making first-time installation, model updates, and multi-environment deployment more stable and automated.

- 2026/06/11 3.3 Released

  This release focuses on **Hybrid parsing performance optimization** and **VLM model capability upgrades**. The main updates include:

  - New `effort` parsing-strength parameter for the Hybrid backend
    - Added two parsing-strength levels, `medium` and `high`, allowing users to balance parsing speed, parsing accuracy, and feature requirements.
    - On OmniDocBench v1.6, `medium` reduces overall accuracy by only `0.13` points compared with `high`, while delivering `35%` ~ `220%` parsing speed improvements across different devices and scenarios:
      - Linux: about `80%` faster for text PDF scenarios and about `35%` faster for OCR scenarios
      - Windows: about `90%` faster for text PDF scenarios and about `45%` faster for OCR scenarios
      - macOS: about `220%` faster for text PDF scenarios and about `50%` faster for OCR scenarios
    - The default Hybrid backend now uses `effort=medium`, significantly improving overall parsing efficiency while maintaining high parsing accuracy.
    - The `medium` level does not support `image analysis`; for maximum parsing accuracy or `image analysis` support, switch to the high-strength parsing mode with `effort=high`, which may have an impact on parsing speed.

  - VLM model upgraded to `MinerU2.5-Pro-2605-1.2B`
    - Fixed multiple model issues found in the `2604` version, further improving parsing stability on complex documents.
    - Added native multilingual OCR support, reducing the need for extra language-parameter configuration and improving out-of-the-box usability for multilingual documents.

  With the 3.3 release, MinerU further improves Hybrid backend efficiency across platforms and scenarios while maintaining high-accuracy parsing. The
2026-06-26
当日 #12 Python +524 ★ today ★ 69.4k

📌 文档解析引擎，PDF/Office转LLM可用的结构化格式
将 PDF、Office 文档等复杂文档转换为 LLM 可用的 markdown/JSON 格式，用于 Agentic 工作流。
#document #llm #parsing