Download notebook

In [ ]:

# Install required packages
!pip install --quiet ipywidgets pandas 'pydantic-ai-slim[openrouter]' python-dotenv tqdm

print('✓ Packages installed!')

In [26]:

import os
from getpass import getpass

from dotenv import load_dotenv

load_dotenv()

if not os.getenv("OPENROUTER_API_KEY"):
    os.environ["OPENROUTER_API_KEY"] = getpass("OpenRouter API key: ")

Out[26]:

True

Pydantic AI basics¶

Pydantic AI - not to be confused with Pydantic! - is a library for interfacing with AI. It's not married to any individual provider (OpenAI, Anthropic, Google), so it's often more flexible and independent than other tools. The people who make it have a track record of quality involvement with the open-source ecosystem so I also trust its continued existence a lot more than other flashy startups.

We'll start by asking a nice simple question to an LLM.

In [2]:

from pydantic_ai import Agent

agent = Agent('openrouter:anthropic/claude-haiku-4-5')

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

# Origins of "Hello World"

The first known "Hello World" program appeared in **1974** in a C tutorial by **Brian Kernighan**. It was a simple program demonstrating how to print text.

However, the phrase became widely popularized through **"The C Programming Language"** (1978), the famous textbook co-written by Kernighan and Dennis Ritchie. This book became the standard reference for C, and their "Hello World" example was adopted by countless programming tutorials that followed.

## Why "Hello World"?

There's no deep reason—it was simply a **convenient, recognizable phrase** for demonstrating output. The choice was somewhat arbitrary, though it's friendly and memorable. Some suggest it may have been influenced by an earlier computer science paper that used "hello, world" casually, but the exact inspiration isn't definitively documented.

## Why it stuck

Once it appeared in such an influential textbook, it became the **de facto first program** for learning any new language. The tradition persists today because:
- It's simple enough for beginners
- It's easy to verify the program works
- It's become an informal standard and cultural touchstone in programming

So while Kernighan didn't necessarily *invent* the idea, he popularized it at just the right moment in computing history.

Instead of talking directly to Anthropic or OpenAI, we're using OpenRouter instead. OpenRouter offers a zillion and one models, along with much better API key management than dealing directly with the providers themselves. If you wanted to talk directly to openai, you definitely can - just use openai:gpt-5-nano instead.

In [3]:

from pydantic_ai import Agent

agent = Agent('openrouter:openai/gpt-5.4-nano')

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

“Hello, world!” is a classic example used to teach basic programming concepts. It comes from programming culture rather than a single modern invention.

- **Origin (historical):** The phrase traces back to **the early days of the C language**. It was popularized by **Brian Kernighan and Dennis Ritchie** in **the book _The C Programming Language_ (1978)**, which famously used the “Hello, world” program as the first example.
- **Why it stuck:** It’s simple to type, easy to verify (“does it run?”), and demonstrates the basic idea of producing output without requiring much prior knowledge.
- **Earlier roots:** Variants of “hello world”-style messages existed even before C (in other languages and tutorials), but **Kernighan & Ritchie’s C book is what cemented it as the standard default example** for many languages.

So, while “hello world” may have appeared in earlier contexts, its widely recognized origin is **the C book (1978)** that made it the go-to beginner example.

In [4]:

from pydantic_ai import Agent

agent = Agent(
    'openrouter:openai/gpt-5.4-nano',
    instructions="Be very, very terse in your responses.")

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

“Hello, world!” originated from **computer programming examples**—first popularized by **Brian Kernighan** in the early **1970s** (notably in *The C Programming Language*, 1978). It became a standard phrase to demonstrate that a language/toolchain is working.

Structured output¶

One of the best use cases for AI is asking for structured data from something unstructured, like court cases. Maybe we have some text we extracted from a PDF of a lawsuit:

In [ ]:

lawsuit = """
Case No. 23STCV12345
Let it be known that a LAWSUIT has been filed in the Superior Court of California,
County of Los Angeles, on July 5 2028.

Barnaby Rutherford vs. Tamper Media LLC

on condition of fraud, breach of contract, and negligence.

The plaintiff alleges that the defendant failed to deliver the
agreed-upon services, resulting in financial losses and emotional distress.
The lawsuit seeks compensation for damages incurred and any additional relief
deemed appropriate by the court.
"""

A naive approach to extract from an LLM might look like the code below.

In [6]:

from pathlib import Path

from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent

MODEL = "openrouter:google/gemini-3.1-flash-lite"

prompt = """List the following about this lawsuit:
- case number
- court
- state
- filing date
- plaintiff
- defendant
- claims
"""

agent = Agent(MODEL)

result = await agent.run([prompt, lawsuit])

print(result.output)

Based on the information provided, here are the lawsuit details:

*   **Case Number:** 23STCV12345
*   **Court:** Superior Court of California, County of Los Angeles
*   **State:** California
*   **Filing Date:** July 5, 2028
*   **Plaintiff:** Jonathan Soma
*   **Defendant:** Tamper Media LLC
*   **Claims:** Fraud, breach of contract, and negligence

That's easier to read, but not perfect, though. We want something nice and programmatic, JSON or dictionaries! This is where Pydantic comes in. You build a model around what you want your response to look like.

In [7]:

from pathlib import Path

from pydantic import BaseModel
from pydantic_ai import Agent

MODEL = "openrouter:google/gemini-3.1-flash-lite"

class LawsuitInfo(BaseModel):
    case_number: str
    court: str
    state: str
    filing_date: str
    plaintiff: str
    defendant: str
    claims: list[str]

agent = Agent(MODEL,
              instructions="Extract the lawsuit information",
              output_type=LawsuitInfo)

result = await agent.run(lawsuit)

print(result.output)

case_number='23STCV12345' court='Superior Court of California, County of Los Angeles' state='California' filing_date='July 5 2028' plaintiff='Jonathan Soma' defendant='Tamper Media LLC' claims=['fraud', 'breach of contract', 'negligence']

Working with a lot of inputs¶

Pydantic and structured outputs shine when you have a lot of data, like all of these car photos.

You make the same setup as before.

While we're at it I'm also going to get very detailed about what we're asking for. We could have done this before but I was trying to keep things simple!

In [ ]:

from pathlib import Path

from pydantic import BaseModel
from pydantic_ai import Agent

MODEL = "openrouter:google/gemini-3.1-flash-lite"

class LawsuitInfo(BaseModel):
    case_number: str
    court: str
    state: str
    filing_date: str
    plaintiff: str
    defendant: str
    claims: list[str]

agent = Agent(MODEL,
              instructions="Extract the lawsuit information",
              output_type=LawsuitInfo)

...and then you just loop through it, collecting the outputs and pushing them into a dataframe.

In [ ]:

import pandas as pd
from tqdm import tqdm

rows = []

lawsuits = [

]

for lawsuit in tqdm(lawsuits):
    # Get the result
    result = await agent.run(lawsuit)

    # Save the result
    row = result.output.model_dump()
    rows.append(row)

print(f"Processed {len(rows)} lawsuits.")

5it [00:13,  2.62s/it]

Processed 5 images.

In [11]:

df = pd.DataFrame(rows)
df

Out[11]:

	make	model	color	year_estimate	vehicle_type	confidence	license_plate	filename
0	Toyota	Yaris	yellow	2018	sedan	1.0	7กถ 4059	28262480.jpg
1	Toyota	Camry	silver	2014	sedan	1.0	0050 60A	28246768.jpg
2	Volkswagen	Multivan (T6)	black	2018	van	1.0	K02VV	28246634.jpg
3	Tesla	Model Y	white	2022	SUV	1.0	沪A AB2295	28266737.jpg
4	Lexus	LX 570	black	2018	SUV	1.0	ZJA 777	28262472.jpg