Transposing data in pandas (switching columns and rows)¶
Switching the columns and rows in a dataset is called transposing, and is one of the simplest forms of reshaping data.
import pandas as pd
df = pd.DataFrame([
{ 'date': '2017-01-01', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},
{ 'date': '2017-01-01', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},
{ 'date': '2017-01-01', 'beef': 1067.7, 'ofada': 319.9, 'cafish': 919.8}
])
df
date | beef | ofada | cafish | |
---|---|---|---|---|
0 | 2017-01-01 | 1001.2 | 377.4 | 899.6 |
1 | 2017-01-01 | 1129.0 | 472.3 | 916.5 |
2 | 2017-01-01 | 1067.7 | 319.9 | 919.8 |
Simple transposition¶
To switch columns and rows, you can just use df.T
df.T
0 | 1 | 2 | |
---|---|---|---|
date | 2017-01-01 | 2017-01-01 | 2017-01-01 |
beef | 1001.2 | 1129.0 | 1067.7 |
ofada | 377.4 | 472.3 | 319.9 |
cafish | 899.6 | 916.5 | 919.8 |
The problem we can see is the first row should be the column names! You could manually the first row and rename your column headers...
df.T.iloc[1:].rename(columns=df.date)
2017-01-01 | 2017-01-01 | 2017-01-01 | |
---|---|---|---|
beef | 1001.2 | 1129.0 | 1067.7 |
ofada | 377.4 | 472.3 | 319.9 |
cafish | 899.6 | 916.5 | 919.8 |
...but there's another way, too.
Transpose but fix column issue with .set_index
¶
With our starting dataset, the date is just the same as any other column.
df
date | beef | ofada | cafish | |
---|---|---|---|---|
0 | 2017-01-01 | 1001.2 | 377.4 | 899.6 |
1 | 2017-01-01 | 1129.0 | 472.3 | 916.5 |
2 | 2017-01-01 | 1067.7 | 319.9 | 919.8 |
When we transpose, it's the same as any other row.
df.T
0 | 1 | 2 | |
---|---|---|---|
date | 2017-01-01 | 2017-01-01 | 2017-01-01 |
beef | 1001.2 | 1129.0 | 1067.7 |
ofada | 377.4 | 472.3 | 319.9 |
cafish | 899.6 | 916.5 | 919.8 |
Where do the numbers on the columns come from? When we transpose, the index of the original dataframe becomes the column headers. To allow our dates to become column headers, we just need to set it as the index before we transpose.
df.set_index('date')
beef | ofada | cafish | |
---|---|---|---|
date | |||
2017-01-01 | 1001.2 | 377.4 | 899.6 |
2017-01-01 | 1129.0 | 472.3 | 916.5 |
2017-01-01 | 1067.7 | 319.9 | 919.8 |
df.set_index('date').T
date | 2017-01-01 | 2017-01-01 | 2017-01-01 |
---|---|---|---|
beef | 1001.2 | 1129.0 | 1067.7 |
ofada | 377.4 | 472.3 | 319.9 |
cafish | 899.6 | 916.5 | 919.8 |
But the opposite happened, too, which is why our foods are bold: the column headers became the index of our rows. To turn them into normal rows, we'll use reset_index()
.
df.set_index('date').T.reset_index()
date | index | 2017-01-01 | 2017-01-01 | 2017-01-01 |
---|---|---|---|---|
0 | beef | 1001.2 | 1129.0 | 1067.7 |
1 | ofada | 377.4 | 472.3 | 319.9 |
2 | cafish | 899.6 | 916.5 | 919.8 |
But they have a ridiculous name now! There's no solution to this that makes me happy.
df.set_index('date').T.reset_index().rename(columns={'index': 'food'})
date | food | 2017-01-01 | 2017-01-01 | 2017-01-01 |
---|---|---|---|---|
0 | beef | 1001.2 | 1129.0 | 1067.7 |
1 | ofada | 377.4 | 472.3 | 319.9 |
2 | cafish | 899.6 | 916.5 | 919.8 |
If we want to get rid of the index label saying date
when it's just a side effect of the transposition, we can use rename_axis
.
df.set_index('date').T.reset_index().rename(columns={'index': 'food'}).rename_axis(None, axis=1)
food | 2017-01-01 | 2017-01-01 | 2017-01-01 | |
---|---|---|---|---|
0 | beef | 1001.2 | 1129.0 | 1067.7 |
1 | ofada | 377.4 | 472.3 | 319.9 |
2 | cafish | 899.6 | 916.5 | 919.8 |
Sigh. This is so much more complicated than the original answer.