{ "cells": [ { "cell_type": "markdown", "id": "6074dfcc", "metadata": {}, "source": [ "# Transposing data in pandas (switching columns and rows)\n", "\n", "Switching the columns and rows in a dataset is called **transposing**, and is one of the simplest forms of reshaping data." ] }, { "cell_type": "code", "execution_count": 3, "id": "531a3a9b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datebeefofadacafish
02017-01-011001.2377.4899.6
12017-01-011129.0472.3916.5
22017-01-011067.7319.9919.8
\n", "
" ], "text/plain": [ " date beef ofada cafish\n", "0 2017-01-01 1001.2 377.4 899.6\n", "1 2017-01-01 1129.0 472.3 916.5\n", "2 2017-01-01 1067.7 319.9 919.8" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame([\n", " { 'date': '2017-01-01', 'beef': 1001.2, 'ofada': 377.4, 'cafish': 899.6},\n", " { 'date': '2017-01-01', 'beef': 1129.0, 'ofada': 472.3, 'cafish': 916.5},\n", " { 'date': '2017-01-01', 'beef': 1067.7, 'ofada': 319.9, 'cafish': 919.8}\n", "])\n", "df" ] }, { "cell_type": "markdown", "id": "01d3ffcc", "metadata": {}, "source": [ "## Simple transposition\n", "\n", "To switch columns and rows, you can just use `df.T`" ] }, { "cell_type": "code", "execution_count": 8, "id": "e745432b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
date2017-01-012017-01-012017-01-01
beef1001.21129.01067.7
ofada377.4472.3319.9
cafish899.6916.5919.8
\n", "
" ], "text/plain": [ " 0 1 2\n", "date 2017-01-01 2017-01-01 2017-01-01\n", "beef 1001.2 1129.0 1067.7\n", "ofada 377.4 472.3 319.9\n", "cafish 899.6 916.5 919.8" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.T" ] }, { "cell_type": "markdown", "id": "b8ddecad", "metadata": {}, "source": [ "The problem we can see is the first row should be the column names! You could manually the first row and rename your column headers..." ] }, { "cell_type": "code", "execution_count": 41, "id": "69b165b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2017-01-012017-01-012017-01-01
beef1001.21129.01067.7
ofada377.4472.3319.9
cafish899.6916.5919.8
\n", "
" ], "text/plain": [ " 2017-01-01 2017-01-01 2017-01-01\n", "beef 1001.2 1129.0 1067.7\n", "ofada 377.4 472.3 319.9\n", "cafish 899.6 916.5 919.8" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.T.iloc[1:].rename(columns=df.date)" ] }, { "cell_type": "markdown", "id": "79d1df6a", "metadata": {}, "source": [ "...but there's another way, too." ] }, { "cell_type": "markdown", "id": "68fe2986", "metadata": {}, "source": [ "## Transpose but fix column issue with `.set_index`\n", "\n", "With our starting dataset, the date is just the same as any other column." ] }, { "cell_type": "code", "execution_count": 69, "id": "7efc7041", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datebeefofadacafish
02017-01-011001.2377.4899.6
12017-01-011129.0472.3916.5
22017-01-011067.7319.9919.8
\n", "
" ], "text/plain": [ " date beef ofada cafish\n", "0 2017-01-01 1001.2 377.4 899.6\n", "1 2017-01-01 1129.0 472.3 916.5\n", "2 2017-01-01 1067.7 319.9 919.8" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "9a4253bd", "metadata": {}, "source": [ "When we transpose, it's the same as any other row." ] }, { "cell_type": "code", "execution_count": 70, "id": "03f9d0bd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
date2017-01-012017-01-012017-01-01
beef1001.21129.01067.7
ofada377.4472.3319.9
cafish899.6916.5919.8
\n", "
" ], "text/plain": [ " 0 1 2\n", "date 2017-01-01 2017-01-01 2017-01-01\n", "beef 1001.2 1129.0 1067.7\n", "ofada 377.4 472.3 319.9\n", "cafish 899.6 916.5 919.8" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.T" ] }, { "cell_type": "markdown", "id": "98d8729b", "metadata": {}, "source": [ "Where do the numbers on the columns come from? When we transpose, the **index of the original dataframe becomes the column headers**. To allow our dates to become column headers, we just need to set it as the index before we transpose." ] }, { "cell_type": "code", "execution_count": 71, "id": "db323f54", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
beefofadacafish
date
2017-01-011001.2377.4899.6
2017-01-011129.0472.3916.5
2017-01-011067.7319.9919.8
\n", "
" ], "text/plain": [ " beef ofada cafish\n", "date \n", "2017-01-01 1001.2 377.4 899.6\n", "2017-01-01 1129.0 472.3 916.5\n", "2017-01-01 1067.7 319.9 919.8" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('date')" ] }, { "cell_type": "code", "execution_count": 76, "id": "fcc11918", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date2017-01-012017-01-012017-01-01
beef1001.21129.01067.7
ofada377.4472.3319.9
cafish899.6916.5919.8
\n", "
" ], "text/plain": [ "date 2017-01-01 2017-01-01 2017-01-01\n", "beef 1001.2 1129.0 1067.7\n", "ofada 377.4 472.3 319.9\n", "cafish 899.6 916.5 919.8" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('date').T" ] }, { "cell_type": "markdown", "id": "d7d2bc46", "metadata": {}, "source": [ "But the opposite happened, too, which is why our foods are bold: **the column headers became the index of our rows**. To turn them into normal rows, we'll use `reset_index()`." ] }, { "cell_type": "code", "execution_count": 75, "id": "7e4fc8b3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateindex2017-01-012017-01-012017-01-01
0beef1001.21129.01067.7
1ofada377.4472.3319.9
2cafish899.6916.5919.8
\n", "
" ], "text/plain": [ "date index 2017-01-01 2017-01-01 2017-01-01\n", "0 beef 1001.2 1129.0 1067.7\n", "1 ofada 377.4 472.3 319.9\n", "2 cafish 899.6 916.5 919.8" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('date').T.reset_index()" ] }, { "cell_type": "markdown", "id": "dbe8c8f4", "metadata": {}, "source": [ "But they have a ridiculous name now! There's no solution to this that makes me happy." ] }, { "cell_type": "code", "execution_count": 82, "id": "e540cebd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datefood2017-01-012017-01-012017-01-01
0beef1001.21129.01067.7
1ofada377.4472.3319.9
2cafish899.6916.5919.8
\n", "
" ], "text/plain": [ "date food 2017-01-01 2017-01-01 2017-01-01\n", "0 beef 1001.2 1129.0 1067.7\n", "1 ofada 377.4 472.3 319.9\n", "2 cafish 899.6 916.5 919.8" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('date').T.reset_index().rename(columns={'index': 'food'})" ] }, { "cell_type": "markdown", "id": "09b153a4", "metadata": {}, "source": [ "If we want to get rid of the index label saying `date` when it's just a side effect of the transposition, we can use `rename_axis`." ] }, { "cell_type": "code", "execution_count": 86, "id": "bca71350", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
food2017-01-012017-01-012017-01-01
0beef1001.21129.01067.7
1ofada377.4472.3319.9
2cafish899.6916.5919.8
\n", "
" ], "text/plain": [ " food 2017-01-01 2017-01-01 2017-01-01\n", "0 beef 1001.2 1129.0 1067.7\n", "1 ofada 377.4 472.3 319.9\n", "2 cafish 899.6 916.5 919.8" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('date').T.reset_index().rename(columns={'index': 'food'}).rename_axis(None, axis=1)" ] }, { "cell_type": "markdown", "id": "502234c1", "metadata": {}, "source": [ "Sigh. This is so much more complicated than the original answer." ] }, { "cell_type": "code", "execution_count": null, "id": "254fef84", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }