Download notebook

Converting wide data to long data¶

There are two major representations of data: one where you have a lot of columns (wide data), and another where you have a lot of rows (long data). Long data is also known as tidy data.

It's pretty easy to switch between the two once you know what you're doing!

Simple example¶

This dataset is wide data because beef, ofada and catfish are all different measurements that can be associated with the date.

In [35]:

import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},
    { 'date': '2017-02-01', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},
])
df
import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},
    { 'date': '2017-02-01', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},
])
df

Out[35]:

	date	beef	ofada	cafish
0	2017-01-01	1001.2	377.4	899.6
1	2017-02-01	1129.0	472.3	916.5

The long version of this dataset would just have rows with a date, a food name, and a price. Beef, ofada and catfish prices would all be on separate rows!

To transform wide data to long data you use the pandas method df.melt. You tell it the columns that should be preserved and it separates everything else out into new rows.

In [9]:

df.melt(id_vars='date', var_name='food', value_name='price')
df.melt(id_vars='date', var_name='food', value_name='price')

Out[9]:

	date	food	price
0	2017-01-01	beef	1001.2
1	2017-02-01	beef	1129.0
2	2017-01-01	ofada	377.4
3	2017-02-01	ofada	472.3
4	2017-01-01	cafish	899.6
5	2017-02-01	cafish	916.5

Converting from wide to long with extra columns¶

In the previous example we only re-used date in each row. Sometimes you need to keep more than one column!

With the data below, we want to keep most the date and scale for each measurement.

In [12]:

import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'scale': 'regional', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},
    { 'date': '2017-02-01', 'scale': 'regional', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},
])
df
import pandas as pd

df = pd.DataFrame([
    { 'date': '2017-01-01', 'scale': 'regional', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},
    { 'date': '2017-02-01', 'scale': 'regional', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},
])
df

Out[12]:

	date	scale	beef	ofada	cafish
0	2017-01-01	regional	1001.2	377.4	899.6
1	2017-02-01	regional	1129.0	472.3	916.5

You juts need to pass a list to id_vars and you're good to go! Both date and scale will be carried over to the new rows, while the remaining columns will turn into a single column.

In [13]:

df.melt(id_vars=['date', 'scale'], var_name='food', value_name='price')
df.melt(id_vars=['date', 'scale'], var_name='food', value_name='price')

Out[13]:

	date	scale	food	price
0	2017-01-01	regional	beef	1001.2
1	2017-02-01	regional	beef	1129.0
2	2017-01-01	regional	ofada	377.4
3	2017-02-01	regional	ofada	472.3
4	2017-01-01	regional	cafish	899.6
5	2017-02-01	regional	cafish	916.5

Converting from long to wide¶

Sometimes you want to do the opposite and transform long data to wide data. Let's use the same data as above:

In [14]:

df = pd.DataFrame([
    { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },
    { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },
    { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },
    { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },
    { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },
    { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }
])
df.head()
df = pd.DataFrame([
    { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },
    { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },
    { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },
    { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },
    { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },
    { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }
])
df.head()

Out[14]:

	date	food	price
0	2017-01-01	beef	1001.2
1	2017-02-01	beef	1129.0
2	2017-01-01	ofada	377.4
3	2017-02-01	ofada	472.3
4	2017-01-01	cafish	899.6

To convert from long to wide, you'll use df.pivot_table on your dataframe.

In [28]:

df.pivot_table(index='date', columns='food', values='price')
df.pivot_table(index='date', columns='food', values='price')

Out[28]:

food	beef	cafish	ofada
date
2017-01-01	1001.2	899.6	377.4
2017-02-01	1129.0	916.5	472.3

Right now the date column is set up as an index. You can convert it to a normal column with .reset_index().

In [30]:

df.pivot_table(index='date', columns='food', values='price').reset_index()
df.pivot_table(index='date', columns='food', values='price').reset_index()

Out[30]:

food	date	beef	cafish	ofada
0	2017-01-01	1001.2	899.6	377.4
1	2017-02-01	1129.0	916.5	472.3

Transforming from long to wide with multiple columns¶

Converting from long to wide while preserving multiple columns is practically the same as doing it for one column.

In [20]:

df = pd.DataFrame([
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },
])
df
df = pd.DataFrame([
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },
    {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },
    {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },
])
df

Out[20]:

	date	scale	food	price
0	2017-01-01	regional	beef	1001.2
1	2017-02-01	regional	beef	1129.0
2	2017-01-01	regional	ofada	377.4
3	2017-02-01	regional	ofada	472.3
4	2017-01-01	regional	cafish	899.6
5	2017-02-01	regional	cafish	916.5

The only difference is when using pivot_table you pass two columns instead of one.

In [32]:

df.pivot_table(index=['date', 'scale'], columns='food', values='price')
df.pivot_table(index=['date', 'scale'], columns='food', values='price')

Out[32]:

	food	beef	cafish	ofada
date	scale
2017-01-01	regional	1001.2	899.6	377.4
2017-02-01	regional	1129.0	916.5	472.3

To turn the axes - the bold parts - into normal columns, all you need to do is reset_index().

In [34]:

df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()
df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()

Out[34]:

food	date	scale	beef	cafish	ofada
0	2017-01-01	regional	1001.2	899.6	377.4
1	2017-02-01	regional	1129.0	916.5	472.3

In [ ]: