{
"cells": [
{
"cell_type": "markdown",
"id": "bf43418a",
"metadata": {},
"source": [
"# Converting wide data to long data\n",
"\n",
"There are two major representations of data: one where you have a lot of *columns* (wide data), and another where you have a lot of *rows* (long data). Long data is also known as *tidy* data.\n",
"\n",
"It's pretty easy to switch between the two once you know what you're doing!\n",
"\n",
"## Simple example\n",
"\n",
"This dataset is wide data because beef, ofada and catfish are all different measurements that can be associated with the date."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "03b8faa5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" beef | \n",
" ofada | \n",
" cafish | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" 1001.2 | \n",
" 377.4 | \n",
" 899.6 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" 1129.0 | \n",
" 472.3 | \n",
" 916.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date beef ofada cafish\n",
"0 2017-01-01 1001.2 377.4 899.6\n",
"1 2017-02-01 1129.0 472.3 916.5"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame([\n",
" { 'date': '2017-01-01', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},\n",
" { 'date': '2017-02-01', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},\n",
"])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "364e8219",
"metadata": {},
"source": [
"The long version of this dataset would just have rows with a date, a food name, and a price. Beef, ofada and catfish prices would all be on separate rows!\n",
"\n",
"To transform wide data to long data you use the pandas method `df.melt`. You tell it the columns that should be preserved and it separates everything else out into new rows."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b649f95f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" food | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" beef | \n",
" 1001.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" beef | \n",
" 1129.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2017-01-01 | \n",
" ofada | \n",
" 377.4 | \n",
"
\n",
" \n",
" 3 | \n",
" 2017-02-01 | \n",
" ofada | \n",
" 472.3 | \n",
"
\n",
" \n",
" 4 | \n",
" 2017-01-01 | \n",
" cafish | \n",
" 899.6 | \n",
"
\n",
" \n",
" 5 | \n",
" 2017-02-01 | \n",
" cafish | \n",
" 916.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date food price\n",
"0 2017-01-01 beef 1001.2\n",
"1 2017-02-01 beef 1129.0\n",
"2 2017-01-01 ofada 377.4\n",
"3 2017-02-01 ofada 472.3\n",
"4 2017-01-01 cafish 899.6\n",
"5 2017-02-01 cafish 916.5"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.melt(id_vars='date', var_name='food', value_name='price')"
]
},
{
"cell_type": "markdown",
"id": "ecd23f00",
"metadata": {},
"source": [
"## Converting from wide to long with extra columns\n",
"\n",
"In the previous example we only re-used `date` in each row. Sometimes you need to keep more than one column!\n",
"\n",
"With the data below, we want to keep most the `date` and `scale` for each measurement."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "86fe2e02",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" scale | \n",
" beef | \n",
" ofada | \n",
" cafish | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" regional | \n",
" 1001.2 | \n",
" 377.4 | \n",
" 899.6 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" regional | \n",
" 1129.0 | \n",
" 472.3 | \n",
" 916.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date scale beef ofada cafish\n",
"0 2017-01-01 regional 1001.2 377.4 899.6\n",
"1 2017-02-01 regional 1129.0 472.3 916.5"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame([\n",
" { 'date': '2017-01-01', 'scale': 'regional', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},\n",
" { 'date': '2017-02-01', 'scale': 'regional', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},\n",
"])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "a37e6736",
"metadata": {},
"source": [
"You juts need to pass a list to `id_vars` and you're good to go! Both `date` and `scale` will be carried over to the new rows, while the remaining columns will turn into a single column."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "805d5a96",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" scale | \n",
" food | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" regional | \n",
" beef | \n",
" 1001.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" regional | \n",
" beef | \n",
" 1129.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2017-01-01 | \n",
" regional | \n",
" ofada | \n",
" 377.4 | \n",
"
\n",
" \n",
" 3 | \n",
" 2017-02-01 | \n",
" regional | \n",
" ofada | \n",
" 472.3 | \n",
"
\n",
" \n",
" 4 | \n",
" 2017-01-01 | \n",
" regional | \n",
" cafish | \n",
" 899.6 | \n",
"
\n",
" \n",
" 5 | \n",
" 2017-02-01 | \n",
" regional | \n",
" cafish | \n",
" 916.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date scale food price\n",
"0 2017-01-01 regional beef 1001.2\n",
"1 2017-02-01 regional beef 1129.0\n",
"2 2017-01-01 regional ofada 377.4\n",
"3 2017-02-01 regional ofada 472.3\n",
"4 2017-01-01 regional cafish 899.6\n",
"5 2017-02-01 regional cafish 916.5"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.melt(id_vars=['date', 'scale'], var_name='food', value_name='price')"
]
},
{
"cell_type": "markdown",
"id": "633e2f2f",
"metadata": {},
"source": [
"## Converting from long to wide\n",
"\n",
"Sometimes you want to do the opposite and transform long data to wide data. Let's use the same data as above:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "84cf59f9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" food | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" beef | \n",
" 1001.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" beef | \n",
" 1129.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2017-01-01 | \n",
" ofada | \n",
" 377.4 | \n",
"
\n",
" \n",
" 3 | \n",
" 2017-02-01 | \n",
" ofada | \n",
" 472.3 | \n",
"
\n",
" \n",
" 4 | \n",
" 2017-01-01 | \n",
" cafish | \n",
" 899.6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date food price\n",
"0 2017-01-01 beef 1001.2\n",
"1 2017-02-01 beef 1129.0\n",
"2 2017-01-01 ofada 377.4\n",
"3 2017-02-01 ofada 472.3\n",
"4 2017-01-01 cafish 899.6"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame([\n",
" { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },\n",
" { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },\n",
" { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },\n",
" { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },\n",
" { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },\n",
" { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }\n",
"])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "0a4c3c3f",
"metadata": {},
"source": [
"To convert from long to wide, you'll use `df.pivot_table` on your dataframe."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "c6b9c608",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" food | \n",
" beef | \n",
" cafish | \n",
" ofada | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2017-01-01 | \n",
" 1001.2 | \n",
" 899.6 | \n",
" 377.4 | \n",
"
\n",
" \n",
" 2017-02-01 | \n",
" 1129.0 | \n",
" 916.5 | \n",
" 472.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"food beef cafish ofada\n",
"date \n",
"2017-01-01 1001.2 899.6 377.4\n",
"2017-02-01 1129.0 916.5 472.3"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pivot_table(index='date', columns='food', values='price')"
]
},
{
"cell_type": "markdown",
"id": "962e5828",
"metadata": {},
"source": [
"Right now the `date` column is set up as an index. You can convert it to a normal column with `.reset_index()`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "7189fb23",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" food | \n",
" date | \n",
" beef | \n",
" cafish | \n",
" ofada | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" 1001.2 | \n",
" 899.6 | \n",
" 377.4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" 1129.0 | \n",
" 916.5 | \n",
" 472.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"food date beef cafish ofada\n",
"0 2017-01-01 1001.2 899.6 377.4\n",
"1 2017-02-01 1129.0 916.5 472.3"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pivot_table(index='date', columns='food', values='price').reset_index()"
]
},
{
"cell_type": "markdown",
"id": "e7124ac9",
"metadata": {},
"source": [
"## Transforming from long to wide with multiple columns\n",
"\n",
"Converting from long to wide while preserving multiple columns is practically the same as doing it for one column."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c93ebc83",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" scale | \n",
" food | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" regional | \n",
" beef | \n",
" 1001.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" regional | \n",
" beef | \n",
" 1129.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2017-01-01 | \n",
" regional | \n",
" ofada | \n",
" 377.4 | \n",
"
\n",
" \n",
" 3 | \n",
" 2017-02-01 | \n",
" regional | \n",
" ofada | \n",
" 472.3 | \n",
"
\n",
" \n",
" 4 | \n",
" 2017-01-01 | \n",
" regional | \n",
" cafish | \n",
" 899.6 | \n",
"
\n",
" \n",
" 5 | \n",
" 2017-02-01 | \n",
" regional | \n",
" cafish | \n",
" 916.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date scale food price\n",
"0 2017-01-01 regional beef 1001.2\n",
"1 2017-02-01 regional beef 1129.0\n",
"2 2017-01-01 regional ofada 377.4\n",
"3 2017-02-01 regional ofada 472.3\n",
"4 2017-01-01 regional cafish 899.6\n",
"5 2017-02-01 regional cafish 916.5"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame([\n",
" {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },\n",
" {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },\n",
" {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },\n",
" {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },\n",
" {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },\n",
" {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },\n",
"])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "e5f4b238",
"metadata": {},
"source": [
"The only difference is when using `pivot_table` you pass *two* columns instead of one."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "7ad7af63",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" food | \n",
" beef | \n",
" cafish | \n",
" ofada | \n",
"
\n",
" \n",
" date | \n",
" scale | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2017-01-01 | \n",
" regional | \n",
" 1001.2 | \n",
" 899.6 | \n",
" 377.4 | \n",
"
\n",
" \n",
" 2017-02-01 | \n",
" regional | \n",
" 1129.0 | \n",
" 916.5 | \n",
" 472.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"food beef cafish ofada\n",
"date scale \n",
"2017-01-01 regional 1001.2 899.6 377.4\n",
"2017-02-01 regional 1129.0 916.5 472.3"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pivot_table(index=['date', 'scale'], columns='food', values='price')"
]
},
{
"cell_type": "markdown",
"id": "60c05d1d",
"metadata": {},
"source": [
"To turn the axes - the bold parts - into normal columns, all you need to do is `reset_index()`."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "f132fb73",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" food | \n",
" date | \n",
" scale | \n",
" beef | \n",
" cafish | \n",
" ofada | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-01-01 | \n",
" regional | \n",
" 1001.2 | \n",
" 899.6 | \n",
" 377.4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-02-01 | \n",
" regional | \n",
" 1129.0 | \n",
" 916.5 | \n",
" 472.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"food date scale beef cafish ofada\n",
"0 2017-01-01 regional 1001.2 899.6 377.4\n",
"1 2017-02-01 regional 1129.0 916.5 472.3"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8e6cd2a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}