## Navigation Menu

- Product Explore
    - GitHub Copilot
        Write better code with AI
    - GitHub Advanced Security
        Find and fix vulnerabilities
    - Actions
        Automate any workflow
    - Codespaces
        Instant dev environments
    - Issues
        Plan and track work
    - Code Review
        Manage code changes
    - Discussions
        Collaborate outside of code
    - Code Search
        Find more, search less
    - Why GitHub
    - All features
    - Documentation
    - GitHub Skills
    - Blog
- Solutions By company size By use case By industry View all solutions
    - Enterprises
    - Small and medium teams
    - Startups
    - Nonprofits
    - DevSecOps
    - DevOps
    - CI/CD
    - View all use cases
    - Healthcare
    - Financial services
    - Manufacturing
    - Government
    - View all industries
- Resources Topics Explore
    - AI
    - DevOps
    - Security
    - Software Development
    - View all
    - Learning Pathways
    - Events &amp; Webinars
    - Ebooks &amp; Whitepapers
    - Customer Stories
    - Partners
    - Executive Insights
- Open Source Repositories
    - GitHub Sponsors
        Fund open source developers
    - The ReadME Project
        GitHub community articles
    - Topics
    - Trending
    - Collections
- Enterprise Available add-ons
    - Enterprise platform
        AI-powered developer platform
    - GitHub Advanced Security
        Enterprise-grade security features
    - Copilot for business
        Enterprise-grade AI features
    - Premium Support
        Enterprise-grade 24/7 support
- Pricing

# Search code, repositories, users, issues, pull requests...

# Provide feedback

We read every piece of feedback, and take your input very seriously.

# Saved searches

## Use saved searches to filter your results more quickly

To see all available qualifiers, see our documentation.

{{ message }}

- Notifications
 You must be signed in to change notification settings
- Fork
    1.8k
- Star
 28.7k

Get your documents ready for gen AI

### License

- Code
- Issues
323
- Pull requests
16
- Discussions
- Actions
- Projects
0
- Security
- Insights

- Code
- Issues
- Pull requests
- Discussions
- Actions
- Projects
- Security
- Insights

# docling-project/docling

## Folders and files

| Name                             | Name                             | Name                             | Last commit message   | Last commit date   |
|----------------------------------|----------------------------------|----------------------------------|-----------------------|--------------------|
| Latest commit History477 Commits | Latest commit History477 Commits | Latest commit History477 Commits |                       |                    |
| .actor                           | .actor                           | .actor                           |                       |                    |
| .github                          | .github                          | .github                          |                       |                    |
| docling                          | docling                          | docling                          |                       |                    |
| docs                             | docs                             | docs                             |                       |                    |
| tests                            | tests                            | tests                            |                       |                    |
| .gitignore                       | .gitignore                       | .gitignore                       |                       |                    |
| .pre-commit-config.yaml          | .pre-commit-config.yaml          | .pre-commit-config.yaml          |                       |                    |
| CHANGELOG.md                     | CHANGELOG.md                     | CHANGELOG.md                     |                       |                    |
| CITATION.cff                     | CITATION.cff                     | CITATION.cff                     |                       |                    |
| CODE_OF_CONDUCT.md               | CODE_OF_CONDUCT.md               | CODE_OF_CONDUCT.md               |                       |                    |
| CONTRIBUTING.md                  | CONTRIBUTING.md                  | CONTRIBUTING.md                  |                       |                    |
| Dockerfile                       | Dockerfile                       | Dockerfile                       |                       |                    |
| LICENSE                          | LICENSE                          | LICENSE                          |                       |                    |
| MAINTAINERS.md                   | MAINTAINERS.md                   | MAINTAINERS.md                   |                       |                    |
| README.md                        | README.md                        | README.md                        |                       |                    |
| mkdocs.yml                       | mkdocs.yml                       | mkdocs.yml                       |                       |                    |
| poetry.lock                      | poetry.lock                      | poetry.lock                      |                       |                    |
| pyproject.toml                   | pyproject.toml                   | pyproject.toml                   |                       |                    |
| View all files                   | View all files                   | View all files                   |                       |                    |

## Repository files navigation

- README
- Code of conduct
- MIT license
- Security

# Docling

Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

## Features

- 🗂️ Parsing of multiple document formats incl. PDF, DOCX, XLSX, HTML, images, and more
- 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
- 🧬 Unified, expressive DoclingDocument representation format
- ↪️ Various export formats and options, including Markdown, HTML, and lossless JSON
- 🔒 Local execution capabilities for sensitive data and air-gapped environments
- 🤖 Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI &amp; Haystack for agentic AI
- 🔍 Extensive OCR support for scanned PDFs and images
- 🥚 Support of Visual Language Models (SmolDocling) 🆕
- 💻 Simple and convenient CLI

### Coming soon

- 📝 Metadata extraction, including title, authors, references &amp; language
- 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
- 📝 Complex chemistry understanding (Molecular structures)

## Installation

To use Docling, simply install docling from your package manager, e.g. pip:

```
pip install docling
```

Works on macOS, Linux and Windows environments. Both x86\_64 and arm64 architectures.

More detailed installation instructions are available in the docs.

## Getting started

To convert individual documents with python, use convert(), for example:

```
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"
```

More advanced usage options are available in
the docs.

## CLI

Docling has a built-in CLI to run conversions.

```
docling https://arxiv.org/pdf/2206.01062
```

You can also use 🥚SmolDocling and other VLMs via Docling CLI:

```
docling --pipeline vlm --vlm-model smoldocling https://arxiv.org/pdf/2206.01062
```

This will use MLX acceleration on supported Apple Silicon hardware.

Read more here

## Documentation

Check out Docling's documentation, for details on
installation, usage, concepts, recipes, extensions, and more.

## Examples

Go hands-on with our examples,
demonstrating how to address different application use cases with Docling.

## Integrations

To further accelerate your AI application development, check out Docling's native
integrations with popular frameworks
and tools.

## Get help and support

Please feel free to connect with us using the discussion section.

## Technical report

For more details on Docling's inner workings, check out the Docling Technical Report.

## Contributing

Please read Contributing to Docling for details.

## References

If you use Docling in your projects, please consider citing the following:

```
@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {Docling Technical Report},
  url = {https://arxiv.org/abs/2408.09869},
  eprint = {2408.09869},
  doi = {10.48550/arXiv.2408.09869},
  version = {1.0.0},
  year = {2024}
}
```

## License

The Docling codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.

## LF AI &amp; Data

Docling is hosted as a project in the LF AI &amp; Data Foundation.

### IBM ❤️ Open Source AI

The project was started by the AI for knowledge team at IBM Research Zurich.

## About

Get your documents ready for gen AI

### Topics

### Resources

### License

### Code of conduct

### Security policy

### Stars

### Watchers

### Forks

## Releases
      95

## Used by 1.7k

## Contributors
      77

## Languages

- Python
71.5%
- HTML
26.7%
- Shell
1.4%
- Dockerfile
0.4%

## Footer

### Footer navigation

- Terms
- Privacy
- Security
- Status
- Docs
- Contact
- Manage cookies
- Do not share my personal information

You can’t perform that action at this time.