Run in Colab Download notebook

In [ ]:

# Install required packages
!pip install --upgrade --quiet python-dotenv
!pip install --upgrade --quiet pydantic-ai
!pip install --upgrade --quiet perplexityai

print('✓ Packages installed!')

Slides: from-vibes-to-benchmarks.pdf

AI agents and observability¶

We'll be using Pydantic AI as our framework.

In [ ]:

import nest_asyncio
from dotenv import load_dotenv
import os

load_dotenv()
nest_asyncio.apply()

# These should all be in .env
# Get your Braintrust API key from https://www.braintrust.dev
os.environ["BRAINTRUST_API_URL"] = "https://api.braintrust.dev"
os.environ["BRAINTRUST_API_KEY"] = ""
os.environ["TAVILY_API_KEY"] = ""
os.environ['ANTHROPIC_API_KEY'] = ''
os.environ['OPENAI_API_KEY'] = ''
os.environ['PERPLEXITY_API_KEY'] = ''
os.environ["PERPLEXITY_API_KEY"] = ""

tavily_key = os.getenv('TAVILY_API_KEY')

You can see all the possible models here (except openai:gpt-5.1-mini doesn't seem to work, even though it's there).

In [ ]:

from pydantic_ai import Agent

agent = Agent(  
    'anthropic:claude-haiku-4-5',
    instructions='Be concise, reply with one sentence.',  
)

result = await agent.run('Where does "hello world" come from?')  
print(result.output)

In [ ]:

from pydantic_ai import Agent

agent = Agent(  
    'openai:gpt-5-nano',
    instructions='Be concise, reply with one sentence.',  
)

result = agent.run_sync('Where does "hello world" come from?')  
print(result.output)

Adding search capability¶

Search capability is a built-in tool, something provided by the model provider. You can see more about built-in tools here.

In [ ]:

from pydantic_ai import Agent, WebSearchTool

agent = Agent(
    'anthropic:claude-haiku-4-5',
    builtin_tools=[WebSearchTool()],
)

result = agent.run_sync('Research who Jonathan Soma is and provide a two-sentence summary of who he likely it.')
print(result.output)

You can provide options - mostly dependent on provider.

In [ ]:

from pydantic_ai import Agent, WebSearchTool, WebSearchUserLocation

agent = Agent(
    'anthropic:claude-haiku-4-5',
    builtin_tools=[
        WebSearchTool(
            search_context_size='medium',
            user_location=WebSearchUserLocation(
                city='New York',
                country='US',
                region='NY'
            ),
            allowed_domains=['brooklynbrainery.com'],
        )
    ],
)

result = agent.run_sync('Research who Jonathan Soma is and provide a two-sentence summary of who he likely it.')
print(result.output)

Adding custom tools¶

If you want to do something not covered by a built-in tool, you can build it yourself. In this case, we want to use Perplexity to do research.

If we used Perplexity by itself, it would look something like this:

query = f"latest news on {topic} 2025"

search = client.search.create(
    query=query,
    max_results=5
)

but because we need to offer it as a tool to Pydantic AI, we need to wrap it in a function. I'll explain the parts below.

In [ ]:

from pydantic_ai import Agent, WebSearchTool
from perplexity import Perplexity

client = Perplexity()

agent = Agent(
    'anthropic:claude-haiku-4-5',
    builtin_tools=[WebSearchTool()],
)

@agent.tool_plain
async def research_topic(topic: str) -> list[dict]:
    """Perform deep research on a topic.

    Args:
        topic: topic to search for
    """
    search = client.search.create(
        query=f"latest news on {topic} 2025",
        max_results=5
    )

    return search.results # type: ignore


topic = "traditional Japanese crafts"

result = agent.run_sync(
    f'Research the best places to learn about {topic}.'
)
print(result.output)

The custom tool has several parts:

@agent.tool_plain means "we are going to let the function below be a tool."

async def research_topic(topic: str) -> list[dict]

This makes research_topic the name of the tool, and tells the agent that it accepts a single string, topic. When it's done it gives back a list of dictionaries. You could remove the -> list[dict] part if you want and it wouldn't ruin anything.

The part after this is the docstring, it fully explains the tool in text. The LLM gets told something like, "You can use the tool research_topic for when you want to Perform deep research on a topic." If you find the LLM isn't using your tool when it should, making your docstring more descriptive is a good first step!

If you have multiple custom tools they'll each need their own unique name.

Adding instrumentation¶

What's the difference between built-in tool and a custom tool? The way we can peek inside what happens.

When we use Claude with WebSearchTool, it's allowed to search the internet, but it's hard for us to see what exactly is going on. If we build our own search tool we can see with much more granularity.

In [ ]:

# Load my API keys from my .env file so you don't steal them :)
# My .env looks something like:
# BRAINTRUST_API_URL=https://api.braintrust.dev
# BRAINTRUST_API_KEY=sk-XXXXXXXXX
# PROJECT_NAME=My Project
from dotenv import load_dotenv

load_dotenv()

We'll start by connecting to Braintrust, our favorite observability platform.

The code below sets it up so all Pydantic AI calls get reported to Braintrust. Every time we run over to Claude, use a tool, etc, it gets logged to our Braintrust account.

In [ ]:

from braintrust.otel import BraintrustSpanProcessor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from pydantic_ai.agent import Agent
from braintrust import current_span, start_span, traced

# Set up tracing for the agent to automatically log to Braintrust
provider = TracerProvider()
trace.set_tracer_provider(provider)

provider.add_span_processor(BraintrustSpanProcessor())

Agent.instrument_all()

We'll start with the hello world example. When you run this code you can refresh your Braintrust project site and see the request details, including the request, the response, and the token usage.

In [ ]:

from pydantic_ai import Agent

agent = Agent(  
    'anthropic:claude-haiku-4-5',
    instructions='Be concise, reply with one sentence.',  
)

result = agent.run_sync('Where does "hello world" come from?')  
print(result.output)

Now let's try a more complicated one with a built-in tool! When you browse the log in Braintrust notice how difficult it is to see the details of the web search (not impossible, just... not easy).

In [ ]:

from pydantic_ai import Agent, WebSearchTool, WebSearchUserLocation

agent = Agent(
    'anthropic:claude-haiku-4-5',
    builtin_tools=[
        WebSearchTool(
            search_context_size='medium',
            user_location=WebSearchUserLocation(
                city='New York',
                country='US',
                region='NY'
            ),
            allowed_domains=['brooklynbrainery.com'],
        )
    ],
)

result = agent.run_sync('Research who Jonathan Soma is and provide a two-sentence summary of who he likely it.')
print(result.output)

Now we'll use a custom tool - notice it looks exactly the same as what we had up above! This is the magic of instrumentation - the Agent.instrument_all() we ran above handles all of the reporting, letting Braintrust deep into the insides of Pydantic AI.

In [ ]:

from pydantic_ai import Agent
from perplexity import Perplexity
from opentelemetry import trace
import json

client = Perplexity()

agent = Agent('anthropic:claude-haiku-4-5')
tracer = trace.get_tracer(__name__)

@agent.tool_plain
async def research_topic(topic: str) -> list[dict]:
    """Do deep research on a topic.

    Args:
        topic: topic to search for
    """
    # We want to specifically track the perplexity search step
    # with the full prompt and everything, the deep_research tool
    # call only tracks the fact that 'topic' came in
    # We name it "perplexity_search" because it's... a perplexity search.
    with tracer.start_as_current_span("perplexity_search") as span:
        query = f"latest news on {topic} 2025"

        search = client.search.create(
            query=query,
            max_results=5
        )
        
    return search.results # type: ignore


topic = "traditional Japanese crafts"

result = agent.run_sync(
    f'Where should I learn about {topic}. Use deep research.'
)
print(result.output)

The problem with custom tools is you might want to log something different.

In this example, our agent is sending the topic to the tool to tell Perplexity what to search. But that isn't really enough information for us to successfully understand what's going on - we need to log the actual Perplexity search query.

To create more details traces, we make a custom span that logs both the query and the max_results that we eventually send to Perplexity. When you run this you'll see a perplexity_search nested inside of the research_topic.

In [ ]:

from pydantic_ai import Agent
from perplexity import Perplexity
from opentelemetry import trace
import json

client = Perplexity()

agent = Agent('anthropic:claude-haiku-4-5')
tracer = trace.get_tracer(__name__)

@agent.tool_plain
async def research_topic(topic: str) -> list[dict]:
    """Do deep research on a topic.

    Args:
        topic: topic to search for
    """
    # We want to specifically track the perplexity search step
    # with the full prompt and everything, the deep_research tool
    # call only tracks the fact that 'topic' came in
    # We name it "perplexity_search" because it's... a perplexity search.
    with tracer.start_as_current_span("perplexity_search") as span:
        query = f"latest news on {topic} 2025"

        # What are we sending to perplexity?
        # the full query and max_results
        # so what do we log?
        # the full query and max_results!
        span.set_attribute("braintrust.input_json", json.dumps([
            {"query": query, "max_results": 5},
        ]))

        # Do the perplexity search
        search = client.search.create(
            query=query,
            max_results=5
        )
        
        # Log everything that comes back from perplexity
        span.set_attribute("braintrust.output_json", search.model_dump_json())

        # If we wanted to log specific other things, we could use this code:
        span.set_attribute("perplexity.sample_extra_info", "example_value")
        # We use 'perplexity." before the name because it's good practice to
        # put it in a namespace not because we have to

    return search.results # type: ignore


topic = "traditional Japanese crafts"

result = agent.run_sync(
    f'Where should I learn about {topic}. Use deep research.'
)
print(result.output)

Prompts¶

We'll work on this next class!

But in the spirit of not deleting anything in this notebook: I'll create one at https://www.braintrust.dev/app/Little%20Columns/p/default-otel-project/prompts

In [ ]:

import braintrust

prompt = braintrust.load_prompt("default-otel-project", "research-request-c393")
prompt

In [ ]:

# it's so many options not just one
prompt.build(topic='traditional japanese crafts')

Datasets¶

If you'd like to fiddle around with datasets for homework, go for it! You'll want to upload them through the Braintrust web interface and link the details in here. The output is... a little wilder than you'd expect.

I'll create one at https://www.braintrust.dev/app/Little%20Columns/p/default-otel-project/datasets

In [ ]:

dataset = braintrust.init_dataset(
    project="default-otel-project",
    name="tips")

for row in dataset:
    print(row['input'])