Getting Started with Natural PDF
Let's get Natural PDF installed and run your first extraction.
Installation
The base installation includes the core library which will allow you to select, extract, and use spatial navigation.
pip install natural-pdf
But! If you want to recognize text, do page layout analysis, document q-and-a or other things, you can install optional dependencies.
Natural PDF has modular dependencies for different features. Install them based on your needs:
# Full ML / QA / semantic-search stack
pip install natural-pdf[ai]
# Deskewing
pip install natural-pdf[deskew]
# Semantic search
pip install natural-pdf[search]
Other OCR and layout analysis engines like surya
, easyocr
, paddle
, doctr
, and docling
can be installed via pip
as needed. The library will provide you with an error message and installation command if you try to use an engine that isn't installed.
After the core install you have two ways to add optional engines:
1 – Helper CLI (recommended)
# list optional groups and their install-status
npdf list
# everything for classification, QA, semantic search, etc.
npdf install ai
# install PaddleOCR stack
npdf install paddle
# install Surya OCR + YOLO layout detector
npdf install surya yolo
The CLI runs each wheel in its own resolver pass, so it avoids strict
version pins like paddleocr → paddlex==3.0.1
while still upgrading to
paddlex 3.0.2
.
2 – Classic extras (for the light stuff)
# Full AI/ML stack
pip install "natural-pdf[ai]"
# Deskewing
pip install "natural-pdf[deskew]"
# Semantic search service
pip install "natural-pdf[search]"
If you attempt to use an engine that is missing, the library will raise an
error that tells you which npdf install …
command to run.
Your First PDF Extraction
Here's a quick example to make sure everything is working:
from natural_pdf import PDF
# Open a PDF
pdf = PDF("https://github.com/jsoma/natural-pdf/raw/refs/heads/main/pdfs/01-practice.pdf")
# Get the first page
page = pdf.pages[0]
# Extract all text
text = page.extract_text()
print(text)
# Find something specific
title = page.find('text:bold')
print(f"Found title: {title.text}")
What's Next?
Now that you have Natural PDF installed, you can:
- Learn to navigate PDFs
- Explore how to select elements
- See how to extract text