Scraping with Python and Playwright
Find this page at bit.ly/birn-data
Getting started
Let’s get started!
When we learned to do data analysis in Python, it was possible to skip setup and run all of our code in the cloud. But not this time! You’ll need to install Jupyterlab Desktop on your computer.
First, download Jupyterlab Desktop by scrolling down to the section on Installation and download the correct version.
Windows folks, download the Windows version. If you have an older mac, you want x64 Installer (Intel chip). The Apple silicon version is for an M1, M2, or M3 mac.
The startup instructions are ridiculous:
Jupyterlab Desktop can be launched from the GUI of your operating system by clicking the application’s icon or by using jlab command from the command line. Double clicking .ipynb files is also supported and it will launch Jupyterlab Desktop and load the notebook file.
What? Just open it like you would any other software:
Once it’s open, move on to the next step.
Creating a new notebook
When data scientists program in Python, they use notebooks. To create a new notebook, click the New Session link in Jupyterlab, then Python 3 under Notebook.
Your Jupyterlab Desktop might take some time to finish installing! It needs to download Python and set some things up. Give it some time to complete the process after you open it.
Running code
To give your code a try, type the following Python code and press the play ▶️ button.
print("Hello")
Inspecting our pages
Opening the web inspector
Right-click (or command-click) to bring up the menu, then select Inspect. It might be slightly different if you aren’t using Chrome!
Using the web inspector
Move your mouse around on the code on the right.
Getting the code
To get the HTML code for part of the page, first find it using the Web Inspector. Then right click, Copy, Copy outerHTML. You can paste this into ChatGPT to help write your scraper.
Writing your scraper
Use this custom prompt to see if you can put together a scraper! We’ll walk through it several times in class.
Our example we’ll use is Apraisal Companies from https://ia-plb.my.site.com/LicenseSearchPage