Skip to content

Choose Your Path

Different backgrounds and goals need different starting points. Find your path below.


By Your Goal

"I need to extract data from PDFs quickly"

Best path: Start with the Quickstart, then go straight to a Cookbook recipe.

  1. Quickstart (5 min)
  2. Idea Gallery - Find your document type
  3. Jump to the matching Cookbook recipe

You'll learn: Basic extraction, how to find the right pattern for your document.


"I need to extract tables"

Best path: Focus on table extraction techniques.

  1. Quickstart (5 min)
  2. Tables Tutorial
  3. Messy Tables - When default extraction fails

You'll learn: Table detection, extraction, handling complex tables.


"I need to process scanned/image PDFs"

Best path: Learn OCR integration.

  1. Quickstart (5 min)
  2. OCR Tutorial
  3. OCR Then Navigate

You'll learn: Applying OCR, choosing engines, extracting from scanned documents.


"I need to build a repeatable pipeline"

Best path: Learn the patterns, then batch processing.

  1. Quickstart (5 min)
  2. Core Concepts
  3. Batch Processing
  4. Idea Gallery - Find your document pattern

You'll learn: Batch patterns, error handling, processing multiple files.


"I need to extract specific sections"

Best path: Master spatial navigation.

  1. Quickstart (5 min)
  2. Spatial Navigation Tutorial
  3. Finding Sections

You'll learn: .below(), .above(), extracting content between markers.


"I need to use AI for extraction"

Best path: Learn the AI features.

  1. Quickstart (5 min)
  2. Document QA Tutorial
  3. Layout Analysis

You'll learn: Document Q&A, structured extraction, layout detection.


By Your Background

Python Beginner

You've done a Python tutorial but aren't comfortable with the language yet.

Recommended path:

  1. Quickstart - Copy-paste working code
  2. Selectors 101 - Understand the selector syntax
  3. Idea Gallery - Find a recipe for your document type
  4. Pick one Cookbook recipe and follow it step by step

Tips:

  • Don't try to understand everything at once
  • Copy the examples, then modify small pieces
  • Use element.show() to see what you're finding
  • Always check if find() returned None before using the result

Data Analyst (Pandas Expert)

You're comfortable with Python and pandas, but new to PDFs.

Recommended path:

  1. Quickstart - See the API style
  2. Core Concepts - Understand the object model
  3. Tables Tutorial - Get data into DataFrames
  4. Batch Processing - Build pipelines

Tips:

  • Tables export directly to DataFrames with .to_df()
  • ElementCollections work like pandas with .filter() and .apply()
  • Use layout=True in extract_text() for readable output

Software Developer

You build production systems and need to evaluate or integrate the library.

Recommended path:

  1. Installation - See dependency options
  2. Core Concepts - Understand the architecture
  3. Patterns & Pitfalls - API reference
  4. Batch Processing - Error handling patterns
  5. Troubleshooting - Common issues

Tips:

  • Install only what you need: pip install natural-pdf for core, add extras like [paddle] or [export] as needed
  • Use context managers or pdf.close() in loops
  • OCR and layout analysis are the slow operations - profile before optimizing

Researcher / Data Scientist

You work in Jupyter and need to extract structured data from documents.

Recommended path:

  1. Quickstart - See the basics
  2. Finding Elements Tutorial - Master selectors
  3. Layout Analysis - Auto-detect structure
  4. Document QA - AI extraction
  5. Regions & Flows - Complex extractions

Tips:

  • Use .show() constantly to visualize what you're finding
  • analyze_layout() can detect tables, figures, and sections automatically
  • For academic papers, start with Finding Sections

Quick Start by Document Type

Document Type Start Here Then Read
Invoices Label-Value Extraction Messy Tables
Forms Label-Value Extraction OCR Then Navigate
Reports Finding Sections Multipage Content
Scanned docs OCR Tutorial OCR Then Navigate
Tables Tables Tutorial Messy Tables
FOIA responses Finding Sections Batch Processing
Contracts Finding Sections Multipage Content

See the Idea Gallery for 30+ document types mapped to patterns.


Still Not Sure?

If you're not sure where to start:

  1. Run the Quickstart - it's 5 minutes
  2. Load your own PDF and try page.extract_text()
  3. Use .show() to visualize what's in your document
  4. Browse the Idea Gallery for similar documents

The best way to learn is to experiment with your actual documents.