# Install required packages
!pip install --quiet ipywidgets pandas 'pydantic-ai-slim[openrouter]' python-dotenv tqdm
print('✓ Packages installed!')
import os
from getpass import getpass
from dotenv import load_dotenv
load_dotenv()
if not os.getenv("OPENROUTER_API_KEY"):
os.environ["OPENROUTER_API_KEY"] = getpass("OpenRouter API key: ")
Pydantic AI - not to be confused with Pydantic! - is a library for interfacing with AI. It's not married to any individual provider (OpenAI, Anthropic, Google), so it's often more flexible and independent than other tools. The people who make it have a track record of quality involvement with the open-source ecosystem so I also trust its continued existence a lot more than other flashy startups.
We'll start by asking a nice simple question to an LLM.
from pydantic_ai import Agent
agent = Agent('openrouter:anthropic/claude-haiku-4-5')
result = await agent.run('Where does "hello world" come from?')
print(result.output)
Instead of talking directly to Anthropic or OpenAI, we're using OpenRouter instead. OpenRouter offers a zillion and one models, along with much better API key management than dealing directly with the providers themselves. If you wanted to talk directly to openai, you definitely can - just use openai:gpt-5-nano instead.
from pydantic_ai import Agent
agent = Agent('openrouter:openai/gpt-5.4-nano')
result = await agent.run('Where does "hello world" come from?')
print(result.output)
from pydantic_ai import Agent
agent = Agent(
'openrouter:openai/gpt-5.4-nano',
instructions="Be very, very terse in your responses.")
result = await agent.run('Where does "hello world" come from?')
print(result.output)
One of the best use cases for AI is asking for structured data from something unstructured, like court cases. Maybe we have some text we extracted from a PDF of a lawsuit:
lawsuit = """
Case No. 23STCV12345
Let it be known that a LAWSUIT has been filed in the Superior Court of California,
County of Los Angeles, on July 5 2028.
Barnaby Rutherford vs. Tamper Media LLC
on condition of fraud, breach of contract, and negligence.
The plaintiff alleges that the defendant failed to deliver the
agreed-upon services, resulting in financial losses and emotional distress.
The lawsuit seeks compensation for damages incurred and any additional relief
deemed appropriate by the court.
"""
A naive approach to extract from an LLM might look like the code below.
from pathlib import Path
from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent
MODEL = "openrouter:google/gemini-3.1-flash-lite"
prompt = """List the following about this lawsuit:
- case number
- court
- state
- filing date
- plaintiff
- defendant
- claims
"""
agent = Agent(MODEL)
result = await agent.run([prompt, lawsuit])
print(result.output)
That's easier to read, but not perfect, though. We want something nice and programmatic, JSON or dictionaries! This is where Pydantic comes in. You build a model around what you want your response to look like.
from pathlib import Path
from pydantic import BaseModel
from pydantic_ai import Agent
MODEL = "openrouter:google/gemini-3.1-flash-lite"
class LawsuitInfo(BaseModel):
case_number: str
court: str
state: str
filing_date: str
plaintiff: str
defendant: str
claims: list[str]
agent = Agent(MODEL,
instructions="Extract the lawsuit information",
output_type=LawsuitInfo)
result = await agent.run(lawsuit)
print(result.output)
Pydantic and structured outputs shine when you have a lot of data, like all of these car photos.
You make the same setup as before.
While we're at it I'm also going to get very detailed about what we're asking for. We could have done this before but I was trying to keep things simple!
from pathlib import Path
from pydantic import BaseModel
from pydantic_ai import Agent
MODEL = "openrouter:google/gemini-3.1-flash-lite"
class LawsuitInfo(BaseModel):
case_number: str
court: str
state: str
filing_date: str
plaintiff: str
defendant: str
claims: list[str]
agent = Agent(MODEL,
instructions="Extract the lawsuit information",
output_type=LawsuitInfo)
...and then you just loop through it, collecting the outputs and pushing them into a dataframe.
import pandas as pd
from tqdm import tqdm
rows = []
lawsuits = [
]
for lawsuit in tqdm(lawsuits):
# Get the result
result = await agent.run(lawsuit)
# Save the result
row = result.output.model_dump()
rows.append(row)
print(f"Processed {len(rows)} lawsuits.")
df = pd.DataFrame(rows)
df