Data analysis and APIs
Pandas is Python’s equivalent to Excel. According to the official documentation:
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Session content
Today I am providing notebooks! Please download them below.
Code and data
Copy these files somewhere you can find (your Desktop, maybe). Then right-click, extract to get the files inside.
If you’d rather just watch and listen instead of type, each notebook has a COMPLETED version. It has my code already.
Slides
Homework
- pandas homework (There is a LOT of it! Don’t worry about finishing it all, even one notebook is good)
- api homework - answer keys
Setup
Pandas is installed using pip
. You can run the code below in a Jupyterlab Desktop notebook cell:
%pip install pandas
To use pandas you need to be running a Jupyter Notebook.
Completed work
Completed notebooks will be posted after class.
Additional links
- Real-world data analysis with pandas and Python, a video series I produced within the past few years
- First Python Notebook, a Jupyter/pandas tutorial by data journalist Ben Welsh
- Inspect Element, a series of tutorials about Undocumented APIs
- Using paginated APIs with Python (four ways!), a video from me about going through APIs that have multiple pages of results
AI Prompts
Pandas
When asking questions to your AI tool, it is usually useful to give an example of your data. You can automatically copy part of your dataframe to the clipboard with the code below
5).to_clipboard() df.sample(
I have a dataframe that looks like the below. I want to _____. (paste sample of dataframe)
It can sometimes be useful to ask “is there a simpler approach?” after you ask a question like the below.
APIs
You can get a description of the structure of your data like this:
def describe_data(data):
if isinstance(data, dict):
return {key: describe_data(value) for key, value in data.items()}
elif isinstance(data, list) and data:
return [describe_data(data[0])]
else:
return type(data)
describe_data(data)
This prompt is a little more complicated, because usually ChatGPT tries to do something too crazy. Feel free to adjust the prompt until it works for you.
I have API output that is a python dictionary, saved as data. The data structure is described as below. Provide code to convert the data into a dataframe of **______**. This probably just means pd.json_normalize with the appropriate key. Don't use meta. If you need to ask clarifying questions about the data itself, provide code snippets I can run to help answer them. You don't need to use print since I'm in a Jupyter notebook. Use functions only if absolutely necessary. (paste description of data)