Download notebook

In [ ]:

# Install required packages
!pip install --quiet ipywidgets pandas 'pydantic-ai-slim[duckduckgo,mcp,openai,openrouter,web-fetch]' python-dotenv braintrust requests tqdm

import os
import urllib.request
import zipfile

# Download and extract data files
url = 'https://github.com/jsoma/workshop-ai-agents/raw/main/docs/01-pydantic-ai-basics/01-pydantic-ai-basics-data.zip'
print(f'Downloading data from {url}...')
urllib.request.urlretrieve(url, '01-pydantic-ai-basics-data.zip')

print('Extracting 01-pydantic-ai-basics-data.zip...')
with zipfile.ZipFile('01-pydantic-ai-basics-data.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

os.remove('01-pydantic-ai-basics-data.zip')
print('✓ Data files extracted!')

In [1]:

import os
from getpass import getpass

from dotenv import load_dotenv

load_dotenv()

for key, label in [
    ("OPENROUTER_API_KEY", "OpenRouter API key"),
    ("OPENAI_API_KEY", "OpenAI API key, for the OpenAI web search examples"),
    ("BRAINTRUST_API_KEY", "Braintrust API key, for tracing examples"),
]:
    if not os.getenv(key):
        os.environ[key] = getpass(f"{label}: ")

Pydantic AI basics¶

Pydantic AI - not to be confused with Pydantic! - is a library for interfacing with AI. It's not married to any individual provider (OpenAI, Anthropic, Google), so it's often more flexible and independent than other tools. The people who make it have a track record of quality involvement with the open-source ecosystem so I also trust its continued existence a lot more than other flashy startups.

We'll start by asking a nice simple question to an LLM.

In [ ]:

from pydantic_ai import Agent

agent = Agent('openrouter:anthropic/claude-haiku-4-5')

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

Instead of talking directly to Anthropic or OpenAI, we're using OpenRouter instead. OpenRouter offers a zillion and one models, along with much better API key management than dealing directly with the providers themselves. If you wanted to talk directly to openai, you definitely can - just use openai:gpt-5-nano instead.

In [ ]:

from pydantic_ai import Agent

agent = Agent('openrouter:openai/gpt-5.4-nano')

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

In [ ]:

from pydantic_ai import Agent

agent = Agent(
    'openrouter:openai/gpt-5.4-nano',
    instructions="Be very, very terse in your responses.")

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

Your mission: Check OpenRouter's top weekly model lists to try with a different model.

In [ ]:

from pydantic_ai import Agent

agent = Agent(
    '',
    instructions="")

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

Structured output¶

One of the best use cases for AI is asking for structured data from something unstructured, like court cases. Maybe we have some text we extracted from a PDF of a lawsuit:

In [ ]:

lawsuit = """
Case No. 23STCV12345
Let it be known that a LAWSUIT has been filed in the Superior Court of California,
County of Los Angeles, on July 5 2028.

Barnaby Rutherford vs. Tamper Media LLC

on condition of fraud, breach of contract, and negligence.

The plaintiff alleges that the defendant failed to deliver the
agreed-upon services, resulting in financial losses and emotional distress.
The lawsuit seeks compensation for damages incurred and any additional relief
deemed appropriate by the court.
"""

A naive approach to extract from an LLM might look like the code below.

In [ ]:

from pathlib import Path

from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent

MODEL = "openrouter:google/gemini-3.1-flash-lite"

prompt = """List the following about this lawsuit:
- case number
- court
- state
- filing date
- plaintiff
- defendant
- claims
"""

agent = Agent(MODEL)

result = await agent.run([prompt, lawsuit])

print(result.output)

That's easier to read, but not perfect, though. We want something nice and programmatic, JSON or dictionaries! This is where Pydantic comes in. You build a model around what you want your response to look like.

In [ ]:

from pathlib import Path

from pydantic import BaseModel, Field
from pydantic_ai import Agent

MODEL = "openrouter:google/gemini-3.1-flash-lite"

class LawsuitInfo(BaseModel):
    case_number: str
    court: str = Field(description="The court where the lawsuit was filed")
    state: str
    filing_date: str
    plaintiff: str
    defendant: str
    claims: list[str]

agent = Agent(MODEL,
              instructions="Extract the lawsuit information",
              output_type=LawsuitInfo)

result = await agent.run(lawsuit)

print(result.output)

Your mission: Change the above code so it gives me the two-letter abbreviation for the state - for example, "NY" instead of "New York."

...and it even works with images!¶

Take for example this car:

No description has been provided for this image

Just like the lawsuit, we can feed it directly to an LLM and ask questions about it.

In [ ]:

from pathlib import Path
from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent

MODEL = "openrouter:google/gemini-3.1-flash-lite"
DATA = Path("data")

class VehicleInfo(BaseModel):
    make: str | None
    model: str | None
    type: str | None
    color: str | None
    license_plate: str | None

image = BinaryContent(
    data=(DATA / "car.jpg").read_bytes(),
    media_type="image/jpeg",
)

agent = Agent(MODEL, 
              instructions="Extract the appropriate vehicle information",
              output_type=VehicleInfo)

result = await agent.run([prompt, image])

print(result.output)

Our question: I want to add an estimated_year attribute. Try adding it as an str, then try adding it as an int. What's the difference?

Working with a lot of inputs¶

Pydantic and structured outputs shine when you have a lot of data, like all of these car photos.

You make the same setup as before.

While we're at it I'm also going to get very detailed about what we're asking for. We could have done this before but I was trying to keep things simple!

In [ ]:

from pathlib import Path

import pandas as pd
from pydantic import BaseModel, Field
from pydantic_ai import Agent, BinaryContent
from typing import Literal

MODEL = "openrouter:google/gemini-3.1-flash-lite"
DATA = Path("data")

class Vehicle(BaseModel):
    make: str = Field(description="Vehicle manufacturer")
    model: str = Field(description="Vehicle model name")
    color: str = Field(description="Primary color")
    year_estimate: int = Field(description="Estimated model year")
    vehicle_type: Literal[
        "sedan", "SUV", "truck", "van", "motorcycle", "other"
    ] = Field(description="Type of vehicle")
    confidence: float = Field(description="Confidence in identification, 0.0 to 1.0")
    license_plate: str | None

agent = Agent(MODEL, output_type=Vehicle)

...and then you just loop through it, collecting the outputs and pushing them into a dataframe.

In [ ]:

from tqdm import tqdm

PROMPT = "Analyze the vehicle in this image. Fill in all fields."

rows = []
image_paths = sorted((DATA / "cars").glob("*.jpg"))
for image_path in tqdm(image_paths):
    # Get the result
    image = BinaryContent(data=image_path.read_bytes(), media_type="image/jpeg")
    result = await agent.run([PROMPT, image])

    # Save the result
    row = result.output.model_dump()
    row["filename"] = image_path.name
    rows.append(row)

print(f"Processed {len(rows)} images.")

In [ ]:

df = pd.DataFrame(rows)
df

Adding tools¶

Talking to an LLM one step at a time is fine, but that isn't what makes something agentic.

Agentic work is about giving the LLM options and letting it work independently until it decides it has come to an answer.

We'll start with WebSearch, which... searches the web.

In [ ]:

from pydantic_ai import Agent
from pydantic_ai.capabilities import WebSearch, WebFetch

MODEL = 'openrouter:anthropic/claude-haiku-4-5'

# Uses the web search built-in to the LLM provider
agent = Agent(
    MODEL,
    capabilities=[WebSearch(local=False)],
)

prompt = """
Research Jonathan Soma and provide a two-sentence summary about who he likely is.
"""

result = await agent.run(prompt)
print(result.output)

You can provide all sorts of options to WebSearch, the most useful are probably site-specific search and location-specific search.

We'll talk about this more, but WebSearch by default is a provider tool, not something you can infinitely customize and have control over. It runs on OpenAI's servers or Anthropic's servers (or whoever else's), and most of the customization is only available in this "native" format, not with the "local" (on your computer) approach.

In [ ]:

from pydantic_ai import Agent, WebSearchTool, WebSearchUserLocation
from pydantic_ai.capabilities import NativeTool

# Different LLM providers have different options
# allowed_domains= does not work with openrouter!

agent = Agent(
    # 'openrouter:anthropic/claude-haiku-4-5',
    'openai-responses:gpt-5.4',
    capabilities=[
        NativeTool(
            WebSearchTool(
                search_context_size='medium',
                user_location=WebSearchUserLocation(
                    city='New York',
                    country='US',
                    region='NY'
                ),
                allowed_domains=['brooklynbrainery.com'],
            )
        )
    ],
)

prompt = """
Research Jonathan Soma and provide a two-sentence summary about who he likely is.
"""

result = await agent.run(prompt)
print(result.output)

Your mission: Edit the code below to see where you should eat dinner after the conference.

In [ ]:

from pydantic_ai import Agent, WebSearchTool, WebSearchUserLocation
from pydantic_ai.capabilities import NativeTool

# Different LLM providers have different options
# allowed_domains= does not work with openrouter!

agent = Agent(
    # 'openrouter:anthropic/claude-haiku-4-5',
    'openai-responses:gpt-5.4',
    capabilities=[
        NativeTool(
            WebSearchTool(
                search_context_size='medium',
                user_location=WebSearchUserLocation(
                    city='New York',
                    country='US',
                    region='NY'
                ),
            )
        )
    ],
)

prompt = """
Research Jonathan Soma and provide a two-sentence summary about who he likely is.
"""

result = await agent.run(prompt)
print(result.output)

Run your searches independently¶

Instead of relying on OpenAI or Anthropic, there are plenty of independent search providers: Perplexity, Exa, Tavily, DuckDuckGo. Some of these are built-in to Pydantic AI, like the duckduckgo API (which we're using because it doesn't require a key!).

In [ ]:

from pydantic_ai import Agent
from pydantic_ai.common_tools.duckduckgo import duckduckgo_search_tool

agent = Agent(
    'openrouter:anthropic/claude-haiku-4-5',
    tools=[
        duckduckgo_search_tool(max_results=10)
    ]
)

result = await agent.run("""
Research Jonathan Soma and provide a two-sentence summary of who he likely is and what he's teaching this semester.
""")
print(result.output)

We'll see another way to customize search below, but... are we satisfied with the response?

Custom tools¶

So far we've only seen tools that search the web. Maybe you have another source of information: an API, local documents, a Slack channel, etc etc etc.

Custom tools allow you to enable the agent to do things it can't do out-of-the-box. Acquiring information is just one tiny slice of opportunity!

In this case, we might want to know the date. We could just put this in the instructions, but... that's not as much fun.

In [2]:

from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

Out[2]:

'2026-05-30'

If we want to be able to have the agent access this, it needs to be able to be wrapped in a function.

In [ ]:

import requests
from datetime import datetime
from typing import Literal

# The docstring explains to the agent what the tool does

def get_date() -> str:
    """Get the current date."""
    return datetime.now().strftime("%Y-%m-%d")

def get_jrn_schedule(semester: Literal["Fall", "Spring", "Summer"], year: int) -> str:
    """Get Columbia Journalism School's schedule details for a given semester, and year."""

    # https://doc.sis.columbia.edu/sel/JOUR_Spring2026_text.html
    response = requests.get(f"https://doc.sis.columbia.edu/sel/JOUR_{semester}{year}_text.html")
    if response.status_code != 200:
        return "Schedule not found."
    return response.text

Let's test them out (I am a little lazy with the schedule one).

In [5]:

get_date()

Out[5]:

'2026-05-30'

In [6]:

text = get_jrn_schedule("Fall", 2023)
text[:2000]

Out[6]:

'<!DOCTYPE html>\n<html><head>\n<title>Department Listing: Journalism Courses in the Fall 2023 Semester</title>\n<meta charset="utf-8">\n<meta name="robots" content="noindex,nofollow">\n<link rel="stylesheet" href="../doc_main.v02.css">\n</head>\n<body><div id="text-header">\n<h1>Department Listing: Journalism Courses in the Fall 2023 Semester</h1></div>\n<pre>\n     Number Sec  Call#      Pts  Title                           Day Time          Room Building        Faculty\n                 L App Activity  Subject\n\nREGI J0001  001  12878     1-12  REGISTERED FOR JOURNALISM                                              \n                       DUMMY CO  Registered                                                             \nRESI J0001  001  12879        0  1-RESIDENCE UNIT F-T JOUR                                              \n                       INDEPEND  Residence Unit                                                         \nRSRH J0001  001  12880        0  RESEARCH-JOURNALISM                                                    \n                       INDEPEND  Research                                                               \nJOUR J0004  001  15231        0  Journalism Now                  T 6:00pm-7:30pm   301 Journalism Buil  O\'Kelley, Winnie\n                       REGISTRA  Journalism                                                             \nJOUR J0005  001  15355        0  MASTER\'S THESIS WORKSHOP        F 10:00am-11:30a  201 Journalism Buil  Belkin, Lisa\n                       DUMMY CO  Journalism                                                             \nJOUR J0008  001  15697        0  LAB FEE REG                                                            \n                       DUMMY CO  Journalism                                                             \n                            Note: $50 lab fee for Photo I\nJOUR J0008  002  15698        0  VC Still Photo FALL                                                    \n                       DUMMY CO  Journali'

Let's give it a try: this agent has a handful of tools. See what happens if you add or remove them!

In [ ]:

from pydantic_ai import Agent
from pydantic_ai.common_tools.duckduckgo import duckduckgo_search_tool

agent = Agent(
    'openrouter:anthropic/claude-haiku-4-5',
    tools=[
        duckduckgo_search_tool(max_results=10),
        # get_date,
        # get_jrn_schedule
    ]
)

result = await agent.run("""
    Research Jonathan Soma and provide a two-sentence summary of who he is and what he's teaching this semester.
""")
print(result.output)

Seeing what tools were called, and with what¶

You can look under the hood to see what tools were called, but it isn't very attractive.

In [ ]:

from pydantic_ai import ToolCallPart

for message in result.all_messages():
    for part in message.parts:
        if isinstance(part, ToolCallPart):
            print("\n## TOOL CALLED")
            print("name:", part.tool_name)
            print("args:", part.args_as_dict())

To do this "properly" we're going to instrument our agent.

Adding instrumentation¶

Observability is the ability to see what's going on inside of your agent. Instrumentation is the process of adding the tools to your agent that allow it to be observed.

We're going to use Braintrust, but LogFire, LangFuse, Arize Phoenix are all other alternatives. The space is crowded!

In [ ]:

from braintrust.wrappers.pydantic_ai import setup_pydantic_ai

# You'll need to use your own API key so they go into your
# Braintrust account. Sign up at https://www.braintrust.dev/

if not os.getenv(key):
    os.environ[key] = getpass(f"{label}: ")

if os.getenv("BRAINTRUST_API_KEY"):
    setup_pydantic_ai(project_name="dataharvest-2026")

In [ ]:

from pydantic_ai import Agent
from pydantic_ai.common_tools.duckduckgo import duckduckgo_search_tool

agent = Agent(
    'openrouter:anthropic/claude-haiku-4-5',
    tools=[
        duckduckgo_search_tool(max_results=10),
        get_date,
        get_jrn_schedule
    ]
)

result = await agent.run("""
    Research Jonathan Soma and provide a two-sentence summary of who he is and what he taught last semester.
""")
print(result.output)

Now I can go see what happened in my Braintrust logs.

As a point of comparison, try it with the following:

In [ ]:

from pydantic_ai import Agent

agent = Agent(
    'openrouter:anthropic/claude-haiku-4-5',
    capabilities=[WebSearch(), WebFetch()]
)

result = await agent.run("""
    Research who Jonathan Soma is and provide a two-sentence summary of who he is and what he taught last semester.
""")
print(result.output)

Capabilities and customization¶

Pydantic can be confusing - tools, toolsets, capabilities, MCP servers... and it doesn't even stay stable! We accept Pydantic's faults, though, and move on.

Capabilities¶

Pydantic has some built-in abilities, but they recently moved some of them into the Pydantic AI harness and let random folks take over some of the implementations and it's just... a little messy at the moment. Take a breath.

Capabilities: https://pydantic.dev/docs/ai/core-concepts/capabilities
Some common ones: https://github.com/vstorm-co/pydantic-ai-backend

vstorm seems to have done a lot with Pydantic but that repo has under a hundred stars, so I don't feel great about it. This is why below we're using the official modelcontextprotocol filesystem MCP server instead of the Pydantic AI Backend one, even though the backend one claims to sandbox etc.

MCP Servers¶

MCP servers are your ability to talk to some other piece of software. Bridges or APIs, if you will.

Below we use one for the filesystem and one for Powerpoint.

Point of discussion: why an MCP Server for Powerpoint instead of a Powerpoint library that runs in Python?

In [ ]:

# Adding the PATH for npx/uvx because otherwise it won't
# be able to find my npx/uvx installation and the tools won't work
# It would work fine if I had opened VS Code from the Terminal

from pathlib import Path
import os

os.environ["PATH"] = os.pathsep.join([
    str(Path.home() / ".volta/bin"),
    str(Path.home() / ".local/bin"),
    os.environ.get("PATH", ""),
])

THESE WILL ALWAYS FAIL THE FIRST TIME. You need to run them twice. It's silly but it's the easiest way to do it.

In [7]:

from pydantic_ai import Agent
from pydantic_ai.mcp import MCPToolset

file_server = MCPToolset({
    "mcpServers": {
        "filesystem": {
            "command": "npx",
            "args": [
                "-y",
                "@modelcontextprotocol/server-filesystem",
                str("."),
            ],
        }
    }
})

ppt_server = MCPToolset({
  "mcpServers": {
    "ppt": {
      "command": "uvx",
      "args": [
        "--from", "office-powerpoint-mcp-server", "ppt_mcp_server"
      ],
      "env": {}
    }
  }
})

# We'll start without toolsets
agent = Agent(
    'openrouter:anthropic/claude-haiku-4-5',
    toolsets=[file_server, ppt_server],
)

instructions = f"""
You are a helpful assistant.
"""

prompt = f"""
What files are in the current directory?
Make a powerpoint presentation about how you did it.
"""

result = await agent.run(prompt,
                        instructions=instructions)

print(result.output)

Perfect! Here's what I did:

## Summary

**Files in the Current Directory:**
The directory contains 26+ items including:
- **Jupyter Notebooks**: 3 notebooks on Pydantic AI
- **Python Files**: 2 teaching example scripts
- **Configuration Files**: .env, .python-version, pyproject.toml, uv.lock
- **Documentation**: README.md, talk.md
- **Data Files**: PDFs and text files
- **Directories**: .venv, .git, data, docs, outputs, research, etc.

**How I Created the Presentation:**

1. **Step 1**: Called `list_allowed_directories()` to find which directories I have access to
2. **Step 2**: Used `list_directory()` to retrieve all files and folders in that location
3. **Step 3**: Created a new PowerPoint presentation with 6 slides:
   - Slide 0: Title slide
   - Slide 1: Step 1 explanation (finding allowed directories)
   - Slide 2: Step 2 explanation (listing contents)
   - Slide 3-4: The files found
   - Slide 5: Summary
4. **Step 4**: Populated each slide with bullet points explaining the process
5. **Step 5**: Saved the presentation as `How_I_Listed_Files.pptx`

The presentation is now saved in your Pydantic AI project directory!

Check out this automated scraper writer for an example of a lot of tools.