Find this page at bit.ly/ds-dojo-2024

データサイエンティストDOJO 2024

Hi, I’m Jonathan Soma! This will host all of the content for Data Scientist Dojo 2024.

Monday

Tuesday

Wednesday

Thursday

Friday

Troubleshooting and project work time

Monday

Working with PDFs

The best tool to use for working with PDFs is pdfplumber. There are plenty of videos on YouTube about how to use it (although I haven’t used them)

You can also use this interactive demo to test out the cropping/table selection.

Automatic scraping

Tutorial for an automatic scraper, although you’ll need to change things a little bit to make it work with Playwright. You’ll also want to make your repository private if it isn’t data you’d like to make public! I recommend following the tutorial first to learn how to do it, then doing it with your “real” data separately.

Deeper data analysis and text analysis

If you’re interested in traditional analysis, you might want to try out investigate.ai, a website I made that is a series of tutorials about data science for non-data-science people (journalists). It’s from before ChatGPT, though, so it might not be the best answer these days. It can give you some ideas about things like regression and text analysis, though.

About the instructor