Choose Your Path
Different backgrounds and goals need different starting points. Find your path below.
By Your Goal
"I need to extract data from PDFs quickly"
Best path: Start with the Quickstart, then go straight to a Cookbook recipe.
- Quickstart (5 min)
- Idea Gallery - Find your document type
- Jump to the matching Cookbook recipe
You'll learn: Basic extraction, how to find the right pattern for your document.
"I need to extract tables"
Best path: Focus on table extraction techniques.
- Quickstart (5 min)
- Tables Tutorial
- Messy Tables - When default extraction fails
You'll learn: Table detection, extraction, handling complex tables.
"I need to process scanned/image PDFs"
Best path: Learn OCR integration.
- Quickstart (5 min)
- OCR Tutorial
- OCR Then Navigate
You'll learn: Applying OCR, choosing engines, extracting from scanned documents.
"I need to build a repeatable pipeline"
Best path: Learn the patterns, then batch processing.
- Quickstart (5 min)
- Core Concepts
- Batch Processing
- Idea Gallery - Find your document pattern
You'll learn: Batch patterns, error handling, processing multiple files.
"I need to extract specific sections"
Best path: Master spatial navigation.
You'll learn: .below(), .above(), extracting content between markers.
"I need to use AI for extraction"
Best path: Learn the AI features.
You'll learn: Document Q&A, structured extraction, layout detection.
By Your Background
Python Beginner
You've done a Python tutorial but aren't comfortable with the language yet.
Recommended path:
- Quickstart - Copy-paste working code
- Selectors 101 - Understand the selector syntax
- Idea Gallery - Find a recipe for your document type
- Pick one Cookbook recipe and follow it step by step
Tips:
- Don't try to understand everything at once
- Copy the examples, then modify small pieces
- Use
element.show()to see what you're finding - Always check if
find()returnedNonebefore using the result
Data Analyst (Pandas Expert)
You're comfortable with Python and pandas, but new to PDFs.
Recommended path:
- Quickstart - See the API style
- Core Concepts - Understand the object model
- Tables Tutorial - Get data into DataFrames
- Batch Processing - Build pipelines
Tips:
- Tables export directly to DataFrames with
.to_df() - ElementCollections work like pandas with
.filter()and.apply() - Use
layout=Trueinextract_text()for readable output
Software Developer
You build production systems and need to evaluate or integrate the library.
Recommended path:
- Installation - See dependency options
- Core Concepts - Understand the architecture
- Patterns & Pitfalls - API reference
- Batch Processing - Error handling patterns
- Troubleshooting - Common issues
Tips:
- Install only what you need:
pip install natural-pdffor core, add extras like[paddle]or[export]as needed - Use context managers or
pdf.close()in loops - OCR and layout analysis are the slow operations - profile before optimizing
Researcher / Data Scientist
You work in Jupyter and need to extract structured data from documents.
Recommended path:
- Quickstart - See the basics
- Finding Elements Tutorial - Master selectors
- Layout Analysis - Auto-detect structure
- Document QA - AI extraction
- Regions & Flows - Complex extractions
Tips:
- Use
.show()constantly to visualize what you're finding analyze_layout()can detect tables, figures, and sections automatically- For academic papers, start with Finding Sections
Quick Start by Document Type
| Document Type | Start Here | Then Read |
|---|---|---|
| Invoices | Label-Value Extraction | Messy Tables |
| Forms | Label-Value Extraction | OCR Then Navigate |
| Reports | Finding Sections | Multipage Content |
| Scanned docs | OCR Tutorial | OCR Then Navigate |
| Tables | Tables Tutorial | Messy Tables |
| FOIA responses | Finding Sections | Batch Processing |
| Contracts | Finding Sections | Multipage Content |
See the Idea Gallery for 30+ document types mapped to patterns.
Still Not Sure?
If you're not sure where to start:
- Run the Quickstart - it's 5 minutes
- Load your own PDF and try
page.extract_text() - Use
.show()to visualize what's in your document - Browse the Idea Gallery for similar documents
The best way to learn is to experiment with your actual documents.