Query an LLM for each pandas row

Read in your dataset.

import pandas as pd
pd.options.display.max_colwidth = None

df = pd.read_csv("bills.csv")
df

	title
0	Relates to wellness programs under life and accident and health insurance policies
1	Requires certain information be provided to prospective maternity patients
2	Relates to the distribution of fines from speed violation in work zones
3	Relates to emergency response plans relating to the notification of downed wires.

Set up a function that you will call for every row. We are going to build a prompt with a fill-in-the-blank for the title name, and try our best to demand only the category name in response. We’re also using temperature=0 because we do not want the model to act “creatively.”

from openai import OpenAI

client = OpenAI(api_key="ABC123")

prompt_template = """
Categorize the following legislative bill as ENVIRONMENT, HEALTHCARE, IMMIGRATION, TAXES/FINES, or OTHER. Only respond with the category name.

Bill title: {text}
"""

def llm_request(row):
    prompt = prompt_template.format(text=row['title'])
    
    messages = [
        { "role": "system", "content": "You are a legislative assistant."},
        { "role": "user", "content": prompt}
    ]

    chat_completion = client.chat.completions.create(
        messages=messages,
        model="gpt-3.5-turbo",
        temperature=0
    )

    return chat_completion.choices[0].message.content

We’ll start by testing with the first row, which has the title Relates to wellness programs under life and accident and health insurance policies.

llm_request(df.iloc[0])

'HEALTHCARE'

We can also make a fake row to test things that aren’t in our data set.

llm_request({'title': 'A bill to drill for oil on every streetcorner'})

'ENVIRONMENT'

Seems reasonable! Now let’s use it for every single row.

# Add the new column
df['category'] = df.apply(llm_request, axis=1)

df

	title	category
0	Relates to wellness programs under life and accident and health insurance policies	HEALTHCARE
1	Requires certain information be provided to prospective maternity patients	HEALTHCARE
2	Relates to the distribution of fines from speed violation in work zones	TAXES/FINES
3	Relates to emergency response plans relating to the notification of downed wires.	ENVIRONMENT

Looks great! You might want to force a JSON response if you want anything more complex (a list of categories, for example).