Download notebook

Converting long data to wide data in pandas¶

There are two major representations of data: one where you have a lot of columns (wide data), and another where you have a lot of rows (long data). Long data is also known as tidy data.

It's pretty easy to switch between the two once you know what you're doing!

Simple example¶

This dataset is long because each row has a date along with a food and a price, with repeating dates in separate rows. An alternative way to present it would be a single date with a column for each food's price.

In [2]:

import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },
    { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },
    { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },
    { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },
    { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },
    { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }
])
df.head()
import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },
    { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },
    { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },
    { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },
    { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },
    { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }
])
df.head()

Out[2]:

	date	food	price
0	2017-01-01	beef	1001.2
1	2017-02-01	beef	1129.0
2	2017-01-01	ofada	377.4
3	2017-02-01	ofada	472.3
4	2017-01-01	cafish	899.6

To convert from long to wide, you'll use df.pivot_table on your dataframe. You tell it how you want the rows identified (by the date), what column should be converted into multiple columns (the food) and what value should fill in the table (in this case, the price).

In [3]:

df.pivot_table(index='date', columns='food', values='price')
df.pivot_table(index='date', columns='food', values='price')

Out[3]:

food	beef	cafish	ofada
date
2017-01-01	1001.2	899.6	377.4
2017-02-01	1129.0	916.5	472.3

You might be used to pivot tables being about aggregation - counts and medians and things like that. In this case we're just using single values to fill in the table.

There's one problem with this transform! Right now the date column is set up as an index, so we can't do df.date. To convert it to a normal column we will use .reset_index().

In [8]:

df.pivot_table(index='date', columns='food', values='price').reset_index()
df.pivot_table(index='date', columns='food', values='price').reset_index()

Out[8]:

food	date	beef	cafish	ofada
0	2017-01-01	1001.2	899.6	377.4
1	2017-02-01	1129.0	916.5	472.3

Transforming from long to wide with multiple columns¶

Converting from long to wide while preserving multiple columns is almost the same as doing it for one column. Let's start with the same data as above.

In [5]:

df = pd.DataFrame([
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },
])
df
df = pd.DataFrame([
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },
])
df

Out[5]:

	date	scale	food	price
0	2017-01-01	regional	beef	1001.2
1	2017-02-01	regional	beef	1129.0
2	2017-01-01	regional	ofada	377.4
3	2017-02-01	regional	ofada	472.3
4	2017-01-01	regional	cafish	899.6
5	2017-02-01	regional	cafish	916.5

The only difference is when using pivot_table you pass two columns instead of one.

In [6]:

df.pivot_table(index=['date', 'scale'], columns='food', values='price')
df.pivot_table(index=['date', 'scale'], columns='food', values='price')

Out[6]:

	food	beef	cafish	ofada
date	scale
2017-01-01	regional	1001.2	899.6	377.4
2017-02-01	regional	1129.0	916.5	472.3

To turn the axes - the bold parts - into normal columns, all you need to do is reset_index().

In [7]:

df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()
df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()

Out[7]:

food	date	scale	beef	cafish	ofada
0	2017-01-01	regional	1001.2	899.6	377.4
1	2017-02-01	regional	1129.0	916.5	472.3

In [ ]: