Choose Your Path

Different backgrounds and goals need different starting points. Find your path below.

By Your Goal

"I need to extract data from PDFs quickly"

Best path: Start with the Quickstart, then go straight to a Cookbook recipe.

Quickstart (5 min)
Idea Gallery - Find your document type
Jump to the matching Cookbook recipe

You'll learn: Basic extraction, how to find the right pattern for your document.

"I need to extract tables"

Best path: Focus on table extraction techniques.

Quickstart (5 min)
Tables Tutorial
Messy Tables - When default extraction fails

You'll learn: Table detection, extraction, handling complex tables.

"I need to process scanned/image PDFs"

Best path: Learn OCR integration.

You'll learn: Applying OCR, choosing engines, extracting from scanned documents.

"I need to build a repeatable pipeline"

Best path: Learn the patterns, then batch processing.

Quickstart (5 min)
Core Concepts
Batch Processing
Idea Gallery - Find your document pattern

You'll learn: Batch patterns, error handling, processing multiple files.

"I need to extract specific sections"

Best path: Master spatial navigation.

You'll learn: .below(), .above(), extracting content between markers.

"I need to use AI for extraction"

Best path: Learn the AI features.

You'll learn: Document Q&A, structured extraction, layout detection.

By Your Background

Python Beginner

You've done a Python tutorial but aren't comfortable with the language yet.

Recommended path:

Quickstart - Copy-paste working code
Selectors 101 - Understand the selector syntax
Idea Gallery - Find a recipe for your document type
Pick one Cookbook recipe and follow it step by step

Tips:

Don't try to understand everything at once
Copy the examples, then modify small pieces
Use element.show() to see what you're finding
Always check if find() returned None before using the result

Data Analyst (Pandas Expert)

You're comfortable with Python and pandas, but new to PDFs.

Recommended path:

Quickstart - See the API style
Core Concepts - Understand the object model
Tables Tutorial - Get data into DataFrames
Batch Processing - Build pipelines

Tips:

Tables export directly to DataFrames with .to_df()
ElementCollections work like pandas with .filter() and .apply()
Use layout=True in extract_text() for readable output

Software Developer

You build production systems and need to evaluate or integrate the library.

Recommended path:

Installation - See dependency options
Core Concepts - Understand the architecture
Patterns & Pitfalls - API reference
Batch Processing - Error handling patterns
Troubleshooting - Common issues

Tips:

Install only what you need: pip install natural-pdf for core, add extras like [paddle] or [export] as needed
Use context managers or pdf.close() in loops
OCR and layout analysis are the slow operations - profile before optimizing

Researcher / Data Scientist

You work in Jupyter and need to extract structured data from documents.

Recommended path:

Quickstart - See the basics
Finding Elements Tutorial - Master selectors
Layout Analysis - Auto-detect structure
Document QA - AI extraction
Regions & Flows - Complex extractions

Tips:

Use .show() constantly to visualize what you're finding
analyze_layout() can detect tables, figures, and sections automatically
For academic papers, start with Finding Sections

Quick Start by Document Type

Document Type	Start Here	Then Read
Invoices	Label-Value Extraction	Messy Tables
Forms	Label-Value Extraction	OCR Then Navigate
Reports	Finding Sections	Multipage Content
Scanned docs	OCR Tutorial	OCR Then Navigate
Tables	Tables Tutorial	Messy Tables
FOIA responses	Finding Sections	Batch Processing
Contracts	Finding Sections	Multipage Content

See the Idea Gallery for 30+ document types mapped to patterns.

Still Not Sure?

If you're not sure where to start:

Run the Quickstart - it's 5 minutes
Load your own PDF and try page.extract_text()
Use .show() to visualize what's in your document
Browse the Idea Gallery for similar documents

The best way to learn is to experiment with your actual documents.