Video

You've got image tools and audio tools. Video is both — frames and a soundtrack — so you already have everything you need! Yes, you could send it to Gemini or another LLM (which we do at the end), but you can also just split it up and reuse what you've learned. Just because public meetings come as videos doesn't mean you can't just treat them as audio, and the same way you count or detect objects in images you can do the same thing in videos.

Download

yt-dlp is the best tool for downloading video content on the internet. It's gotten less effect at YouTube in the past 6 months or so, but it's still great for older videos and non-YT sites. Always try it out first!

If you'd prefer something that has a "normal" non-programming-y interface, Stacher is great.

video/download.py — Download a video from YouTube with yt-dlp

from pathlib import Path
import yt_dlp

DATA = Path("data")
URL = "https://www.youtube.com/watch?v=rDXubdQdJYs"

DATA.mkdir(parents=True, exist_ok=True)
VIDEO_ID = URL.split("v=")[1]

existing = list(DATA.glob(f"{VIDEO_ID}.*"))
if existing:
    print(f"Already downloaded: {existing[0].name}")
else:
    ydl_opts = {
        "outtmpl": str(DATA / "%(id)s.%(ext)s"),
        "quiet": True,
        "no_warnings": True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([URL])
    print(f"Downloaded: {VIDEO_ID}")

Already downloaded: rDXubdQdJYs.mp3

Extract frames

Since you know how to analyze images, why not just separate out the frames in a video? Below we extract one frame every two seconds. Now you have images: use the image tools!

video/frames.py — Extract frames from a video at 1 frame every 2 seconds using ffmpeg-python

import ffmpeg
from pathlib import Path

DATA = Path("data")
VIDEO = DATA / "rDXubdQdJYs.mp4"
OUTPUT = DATA / "debate"
OUTPUT.mkdir(parents=True, exist_ok=True)

(
    ffmpeg
    .input(str(VIDEO))
    .filter("fps", fps=0.5)
    .output(str(OUTPUT / "frame-%03d.jpg"), **{"qscale:v": 2})
    .overwrite_output()
    .run(quiet=True)
)

frames = sorted(OUTPUT.glob("frame-*.jpg"))
print(f"Extracted {len(frames)} frames to {OUTPUT}")

Extracted 30 frames to data/debate

Preview: a sample of what we just extracted.

import matplotlib.pyplot as plt
from PIL import Image

sample = frames[:10]
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for ax, path in zip(axes.flat, sample):
    ax.imshow(Image.open(path))
    ax.set_title(path.stem, fontsize=8)
    ax.axis("off")
plt.tight_layout()

Extract audio

Same thing for the audio. Now you have audio: use the audio tools!

video/audio.py — Extract audio track from a video file using ffmpeg-python

import ffmpeg
from pathlib import Path

DATA = Path("data")
VIDEO = DATA / "rDXubdQdJYs.mp4"
OUTPUT = Path("outputs")
OUTPUT.mkdir(parents=True, exist_ok=True)

(
    ffmpeg
    .input(str(VIDEO))
    .output(str(OUTPUT / "rDXubdQdJYs.mp3"), acodec="libmp3lame", vn=None)
    .overwrite_output()
    .run(quiet=True)
)

print(f"Audio saved to {OUTPUT / 'rDXubdQdJYs.mp3'}")

Audio saved to outputs/rDXubdQdJYs.mp3

Answering quetsions the slightly risky way

Just like you can ask AI about images, you can also ask about videos! ...at least, for some of the models. Gemini is probably the best: ask a question about the video, get a confident answer. It... may or may not be accurate.

video/vibe-answer.py — The risky way: ask Gemini "who got more screen time?" -- confident answer, no evidence

import time
from pathlib import Path

from pydantic_ai import Agent, VideoUrl
from pydantic_ai.providers.google import GoogleProvider

DATA = Path("data")
VIDEO = DATA / "rDXubdQdJYs.mp4"
MODEL = "google-gla:gemini-2.5-flash"

provider = GoogleProvider()
video_file = provider.client.files.upload(file=str(VIDEO))

while video_file.state.name == "PROCESSING":
    time.sleep(5)
    video_file = provider.client.files.get(name=video_file.name)

agent = Agent(MODEL)
result = agent.run_sync([
    "Who got more screen time in this debate video? "
    "Give me a breakdown of approximately how much time each person was on screen.",
    VideoUrl(url=video_file.uri, media_type=video_file.mime_type),
])

print(result.output)

Based on a manual timing of the video:

*   **Joe Biden** was on screen for approximately **29 seconds**.
*   **Donald Trump** was on screen for approximately **23.5 seconds**.

Therefore, **Joe Biden** got more screen time in this debate video.

Gemini's raw API docs: video, audio.

While it's convenient, using Gemini is slightly more difficult to fact-check. It's harder to show your editor the work, errors are hidden behind a wall of "just trust me!" It's a vibe, for better or worse.

The boring (auditable) way

Instead of getting Gemini to go whole-hog on the video analysis, we use our other skills: split into frames, classify each frame with an LLM. This allows you to produce an auditable CSV, where every row links to a frame you can check.

video/decompose-classify.py — The auditable way: classify each frame with Pydantic AI, produce an auditable CSV

from pathlib import Path
import pandas as pd
from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent

DATA = Path("data")
MODEL = "google-gla:gemini-2.5-flash"
# MODEL = "openai:gpt-5-nano"
FRAMES_DIR = DATA / "debate"
OUTPUT = Path("outputs") / "frame_classifications.csv"

class FrameClassification(BaseModel):
    subject: str
    confidence: float
    speaking: bool
    description: str

agent = Agent(
    MODEL,
    output_type=FrameClassification,
    system_prompt="Classify frames from a political debate. Identify who is on screen, confidence 0-1, whether they are speaking.",
)

frames = sorted(FRAMES_DIR.glob("*.jpg"))
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

rows = []
for i, path in enumerate(frames):
    r = agent.run_sync([
        f"Frame {i}",
        BinaryContent(data=path.read_bytes(), media_type="image/jpeg"),
    ])
    rows.append({"frame": i, **r.output.model_dump()})
    print(f"{path.name}: {r.output.subject} ({r.output.confidence:.2f})")

frame-001.jpg: Joe Biden (0.95)

frame-002.jpg: Joe Biden (1.00)

frame-003.jpg: Donald Trump (1.00)

frame-004.jpg: Joe Biden (0.95)

frame-005.jpg: Donald Trump (1.00)

frame-006.jpg: Donald Trump (0.90)

frame-007.jpg: Donald Trump (1.00)

frame-008.jpg: Joe Biden (1.00)

frame-009.jpg: Joe Biden (0.95)

frame-010.jpg: Donald Trump (1.00)

frame-011.jpg: Joe Biden (1.00)

frame-012.jpg: Joe Biden (1.00)

frame-013.jpg: Joe Biden (0.95)

frame-014.jpg: Joe Biden (1.00)

frame-015.jpg: Joe Biden (0.95)

frame-016.jpg: Donald Trump (1.00)

frame-017.jpg: Donald Trump (0.95)

frame-018.jpg: Joe Biden (1.00)

frame-019.jpg: Joe Biden (1.00)

frame-020.jpg: Joe Biden (1.00)

frame-021.jpg: Donald Trump (0.95)

frame-022.jpg: Joe Biden (0.95)

frame-023.jpg: Joe Biden (1.00)

frame-024.jpg: Joe Biden (1.00)

frame-025.jpg: Donald Trump (1.00)

frame-026.jpg: Donald Trump (0.99)

frame-027.jpg: Donald Trump (0.95)

frame-028.jpg: Joe Biden (1.00)

frame-029.jpg: Joe Biden (1.00)

frame-030.jpg: Joe Biden (1.00)

Every frame is now a row. You can sort, filter, and fact-check any one of them.

df = pd.DataFrame(rows)
df.to_csv(OUTPUT, index=False)
print(f"Saved {len(df)} frames to {OUTPUT}")

df

Saved 30 frames to outputs/frame_classifications.csv

	frame	subject	confidence	speaking	description
0	0	Joe Biden	0.95	False	Joe Biden is on screen, walking onto the stage.
1	1	Joe Biden	1.00	True	Joe Biden is speaking at a political debate ho...
2	2	Donald Trump	1.00	True	Donald Trump is on screen during a political d...
3	3	Joe Biden	0.95	True	Joe Biden is on screen, likely speaking, as a ...
4	4	Donald Trump	1.00	True	Donald Trump, former president, is on screen, ...
5	5	Donald Trump	0.90	True	Donald Trump, former president, is speaking du...
6	6	Donald Trump	1.00	True	Donald Trump is speaking at a political debate...
7	7	Joe Biden	1.00	True	Joe Biden is on screen, speaking. His mouth is...
8	8	Joe Biden	0.95	True	Joe Biden is speaking during a political debat...
9	9	Donald Trump	1.00	True	Donald Trump is speaking, visible from the che...
10	10	Joe Biden	1.00	True	Joe Biden is on screen, appearing to be speaki...
11	11	Joe Biden	1.00	True	Joe Biden is on screen, speaking during a poli...
12	12	Joe Biden	0.95	True	Joe Biden is speaking at a political debate, w...
13	13	Joe Biden	1.00	True	Joe Biden is on screen, speaking during a poli...
14	14	Joe Biden	0.95	True	Joe Biden is on screen, mid-sentence, wearing ...
15	15	Donald Trump	1.00	True	Donald Trump is on screen, speaking during a p...
16	16	Donald Trump	0.95	True	Donald Trump is on screen and speaking. Subtit...
17	17	Joe Biden	1.00	True	Joe Biden is speaking, with his eyes closed an...
18	18	Joe Biden	1.00	True	Joe Biden is speaking at a political debate. T...
19	19	Joe Biden	1.00	True	Joe Biden is speaking at a debate, looking dow...
20	20	Donald Trump	0.95	True	Donald Trump is on screen, wearing a suit and ...
21	21	Joe Biden	0.95	True	Joe Biden is on screen, speaking into a microp...
22	22	Joe Biden	1.00	True	Joe Biden is on screen and appears to be speak...
23	23	Joe Biden	1.00	False	Joe Biden is on screen. Based on the text over...
24	24	Donald Trump	1.00	True	Donald Trump is speaking at a debate, wearing ...
25	25	Donald Trump	0.99	True	Donald Trump is speaking during a political de...
26	26	Donald Trump	0.95	True	Donald Trump is on screen, speaking. His mouth...
27	27	Joe Biden	1.00	True	Joe Biden is speaking during a political debat...
28	28	Joe Biden	1.00	True	Joe Biden is on screen, speaking during a poli...
29	29	Joe Biden	1.00	True	Joe Biden is speaking at a political debate, w...