Scraping with Python and Playwright

Find this page at bit.ly/birn-data

Slides

Getting started

Let’s get started!

When we learned to do data analysis in Python, it was possible to skip setup and run all of our code in the cloud. But not this time! You’ll need to install Jupyterlab Desktop on your computer.

First, download Jupyterlab Desktop by scrolling down to the section on Installation and download the correct version.

Windows folks, download the Windows version. If you have an older mac, you want x64 Installer (Intel chip). The Apple silicon version is for an M1, M2, or M3 mac.

The startup instructions are ridiculous:

Jupyterlab Desktop can be launched from the GUI of your operating system by clicking the application’s icon or by using jlab command from the command line. Double clicking .ipynb files is also supported and it will launch Jupyterlab Desktop and load the notebook file.

What? Just open it like you would any other software:

Open up Jupyterlab Desktop

Once it’s open, move on to the next step.

Creating a new notebook

When data scientists program in Python, they use notebooks. To create a new notebook, click the New Session link in Jupyterlab, then Python 3 under Notebook.

Your Jupyterlab Desktop might take some time to finish installing! It needs to download Python and set some things up. Give it some time to complete the process after you open it.

New session

New notebook

Running code

To give your code a try, type the following Python code and press the play ▶️ button.

print("Hello")

Inspecting our pages

Opening the web inspector

Right-click (or command-click) to bring up the menu, then select Inspect. It might be slightly different if you aren’t using Chrome!

Using the web inspector

Move your mouse around on the code on the right.

Getting the code

To get the HTML code for part of the page, first find it using the Web Inspector. Then right click, Copy, Copy outerHTML. You can paste this into ChatGPT to help write your scraper.

Writing your scraper

Use this custom prompt to see if you can put together a scraper! We’ll walk through it several times in class.

Our example we’ll use is Apraisal Companies from https://ia-plb.my.site.com/LicenseSearchPage