Query an LLM for each pandas row

Read in your dataset.

import pandas as pd
pd.options.display.max_colwidth = None

df = pd.read_csv("bills.csv")
df
title
0 Relates to wellness programs under life and accident and health insurance policies
1 Requires certain information be provided to prospective maternity patients
2 Relates to the distribution of fines from speed violation in work zones
3 Relates to emergency response plans relating to the notification of downed wires.

Set up a function that you will call for every row. We are going to build a prompt with a fill-in-the-blank for the title name, and try our best to demand only the category name in response. We’re also using temperature=0 because we do not want the model to act “creatively.”

from openai import OpenAI

client = OpenAI(api_key="ABC123")

prompt_template = """
Categorize the following legislative bill as ENVIRONMENT, HEALTHCARE, IMMIGRATION, TAXES/FINES, or OTHER. Only respond with the category name.

Bill title: {text}
"""

def llm_request(row):
    prompt = prompt_template.format(text=row['title'])
    
    messages = [
        { "role": "system", "content": "You are a legislative assistant."},
        { "role": "user", "content": prompt}
    ]

    chat_completion = client.chat.completions.create(
        messages=messages,
        model="gpt-3.5-turbo",
        temperature=0
    )

    return chat_completion.choices[0].message.content

We’ll start by testing with the first row, which has the title Relates to wellness programs under life and accident and health insurance policies.

llm_request(df.iloc[0])
'HEALTHCARE'

We can also make a fake row to test things that aren’t in our data set.

llm_request({'title': 'A bill to drill for oil on every streetcorner'})
'ENVIRONMENT'

Seems reasonable! Now let’s use it for every single row.

# Add the new column
df['category'] = df.apply(llm_request, axis=1)

df
title category
0 Relates to wellness programs under life and accident and health insurance policies HEALTHCARE
1 Requires certain information be provided to prospective maternity patients HEALTHCARE
2 Relates to the distribution of fines from speed violation in work zones TAXES/FINES
3 Relates to emergency response plans relating to the notification of downed wires. ENVIRONMENT

Looks great! You might want to force a JSON response if you want anything more complex (a list of categories, for example).