BIRN Summer School 2025 Data Journalism

Jonathan Soma, Columbia University

Contact: js4571@columbia.edu@dangerscarf

Sites: Lede Programjonathansoma.comPractical AI for Investigative JournalismBad PDFs

Learn to scrape, extract and analyze data using Python

Adv Data Journalism I: Structured Data

First slide

📥 Download PDF

Why can't we just use AI for everything???

Exploring what AI can and can't do when working with structured data

📦 Data: 00-cheating-data.zip
📑 Slides: structured-data.pdf

Links:

Data analysis basics with pandas

Pandas is the most common tool that programmers use for analyzing data. And if that wasn't good enough for you: AI uses it, too!

🚀 Live coding worksheet ✓ Completed version
📑 Slides: structured-data.pdf

Extracting Bid Data from PDFs

Working with data in the real world is an awful, awful experience. Let's work on some spreadsheets about Kosovo's privatisation efforts.

🚀 Live coding worksheet ✓ Completed version
📑 Slides: structured-data.pdf

Links:

AI Do's and Don't

First slide

📥 Download PDF

AI Do's and Don'ts

How can AI help with newsroom tasks?

📦 Data: ai-dos-and-donts-data.zip
📑 Slides: birn-ai-dos-donts.pdf

Links:

Adv Data Journalism II: Unstructured Data

First slide

📥 Download PDF

Scraping web sites and unstructured data

Making the best of data from the internet and bad, ugly data from PDFs

📦 Data: scraping-web-data.zip
📑 Slides: unstructured-data.pdf

Links:


Created for Balkan Investigative Reporting Network Summer School by Jonathan Soma