Investigating Documents using AI
July 13, 2024, Abraji Congress, São Paulo
Jonathan Soma, Knight Chair in Data Journalism, Columbia University.
Director of the MS in Data Journalism (1 year Master’s degree) and Lede Program (10 week summer program).
Find me at @dangerscarf or js4571@columbia.edu.
Notes
This only covers the small, small portion of content that I talked about at Abraji! If you’re interested in hearing more in-depth or broader uses of AI in journalism, look at my 12-hour video series Practical AI for Investigative Journalism.
If you’re interested in more programming stuff, take a look at jonathansoma.com and Everything I Know, two places where I put a lot of my content.
Slides
You can find the slides on the slides page
Links
- OpenAI Playground to see probabilities for different texts. You’ll need to select “Completions” on the left and “Show probabilities: Full Spectrum” in the bottom right-hand corner.
- Chatbots may hallucinate more often than many realize from The New York Times
- Hallucination Leaderboard
- AcaraJazz Instagram page
- Anthropic’s Claude (similar to ChatGPT)
- Washington Post investigation into app store complaints
- The CITY’s coverage map of stories
AI in a Spreadsheet
- Claude for Sheets
- API keys
- OpenAI API key page
- Anthropic’s (Claude) API key page
- If you want to try it out without signing up, just email me!
- Google Sheet that uses
=CLAUDE
AI in Python
- Instructor for using Python to extract structured data
RAG (“chatting with your PDFs”)
- AnythingLLM for chatting with your documents without Python
- LlamaIndex for chatting with your documents in Python
- MyCity NYC chatbot
- NYC’s AI Chatbot Tells Businesses to Break the Law from The Markup
Other links
- Tabula for extracting tables from PDFs
- OCRmyPDF for adding a text layet to image-based PDFs
- Semantra a “semantic search” tool that allows you to do fuzzy/non-exact/vibes-based searching over documents
- Investigate.ai: Data Science for Journalists, a website on traditional machine learning I built in around 2018