{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bf43418a",
   "metadata": {},
   "source": [
    "# Converting long data to wide data in pandas\n",
    "\n",
    "There are two major representations of data: one where you have a lot of *columns* (wide data), and another where you have a lot of *rows* (long data). Long data is also known as *tidy* data.\n",
    "\n",
    "It's pretty easy to switch between the two once you know what you're doing!\n",
    "\n",
    "## Simple example\n",
    "\n",
    "This dataset is long because each row has a date along with a food and a price, with repeating dates in separate rows. An alternative way to present it would be a single date with a column for each food's price."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "41e46c47",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>food</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>beef</td>\n",
       "      <td>1001.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>beef</td>\n",
       "      <td>1129.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>ofada</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>ofada</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>cafish</td>\n",
       "      <td>899.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date    food   price\n",
       "0  2017-01-01    beef  1001.2\n",
       "1  2017-02-01    beef  1129.0\n",
       "2  2017-01-01   ofada   377.4\n",
       "3  2017-02-01   ofada   472.3\n",
       "4  2017-01-01  cafish   899.6"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame([\n",
    "    { 'date': '2017-01-01', 'food': 'beef', 'price': 1001.2 },\n",
    "    { 'date': '2017-02-01', 'food': 'beef', 'price': 1129.0 },\n",
    "    { 'date': '2017-01-01', 'food': 'ofada', 'price': 377.4 },\n",
    "    { 'date': '2017-02-01', 'food': 'ofada', 'price': 472.3 },\n",
    "    { 'date': '2017-01-01', 'food': 'cafish', 'price': 899.6 },\n",
    "    { 'date': '2017-02-01', 'food': 'cafish', 'price': 916.5 }\n",
    "])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a5af954",
   "metadata": {},
   "source": [
    "To convert from long to wide, you'll use `df.pivot_table` on your dataframe. You tell it how you want the rows identified (by the `date`), what column should be converted into multiple columns (the `food`) and what value should fill in the table (in this case, the `price`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "fb4c23bd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>food</th>\n",
       "      <th>beef</th>\n",
       "      <th>cafish</th>\n",
       "      <th>ofada</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2017-01-01</th>\n",
       "      <td>1001.2</td>\n",
       "      <td>899.6</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-02-01</th>\n",
       "      <td>1129.0</td>\n",
       "      <td>916.5</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "food          beef  cafish  ofada\n",
       "date                             \n",
       "2017-01-01  1001.2   899.6  377.4\n",
       "2017-02-01  1129.0   916.5  472.3"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='date', columns='food', values='price')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da087cb7",
   "metadata": {},
   "source": [
    "You might be used to pivot tables being about aggregation - counts and medians and things like that. In this case we're just using single values to fill in the table.\n",
    "\n",
    "There's one problem with this transform! Right now the `date` column is set up as an index, so we can't do `df.date`. To convert it to a normal column we will use `.reset_index()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "86507a23",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>food</th>\n",
       "      <th>date</th>\n",
       "      <th>beef</th>\n",
       "      <th>cafish</th>\n",
       "      <th>ofada</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>1001.2</td>\n",
       "      <td>899.6</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>916.5</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "food        date    beef  cafish  ofada\n",
       "0     2017-01-01  1001.2   899.6  377.4\n",
       "1     2017-02-01  1129.0   916.5  472.3"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='date', columns='food', values='price').reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bbd389b",
   "metadata": {},
   "source": [
    "## Transforming from long to wide with multiple columns\n",
    "\n",
    "Converting from long to wide while preserving multiple columns is almost the same as doing it for one column. Let's start with the same data as above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "56c4e1a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>scale</th>\n",
       "      <th>food</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>beef</td>\n",
       "      <td>1001.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>beef</td>\n",
       "      <td>1129.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>ofada</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>ofada</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>cafish</td>\n",
       "      <td>899.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>cafish</td>\n",
       "      <td>916.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date     scale    food   price\n",
       "0  2017-01-01  regional    beef  1001.2\n",
       "1  2017-02-01  regional    beef  1129.0\n",
       "2  2017-01-01  regional   ofada   377.4\n",
       "3  2017-02-01  regional   ofada   472.3\n",
       "4  2017-01-01  regional  cafish   899.6\n",
       "5  2017-02-01  regional  cafish   916.5"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([\n",
    "    {'date': '2017-01-01', 'scale': 'regional', 'food': 'beef', 'price': 1001.2 },\n",
    "    {'date': '2017-02-01', 'scale': 'regional', 'food': 'beef', 'price': 1129.0 },\n",
    "    {'date': '2017-01-01', 'scale': 'regional', 'food': 'ofada', 'price': 377.4 },\n",
    "    {'date': '2017-02-01', 'scale': 'regional', 'food': 'ofada', 'price': 472.3 },\n",
    "    {'date': '2017-01-01', 'scale': 'regional', 'food': 'cafish', 'price': 899.6 },\n",
    "    {'date': '2017-02-01', 'scale': 'regional', 'food': 'cafish', 'price': 916.5 },\n",
    "])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad2e4d6c",
   "metadata": {},
   "source": [
    "The only difference is when using `pivot_table` you pass *two* columns instead of one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "735c6d1a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>beef</th>\n",
       "      <th>cafish</th>\n",
       "      <th>ofada</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>scale</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2017-01-01</th>\n",
       "      <th>regional</th>\n",
       "      <td>1001.2</td>\n",
       "      <td>899.6</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-02-01</th>\n",
       "      <th>regional</th>\n",
       "      <td>1129.0</td>\n",
       "      <td>916.5</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "food                   beef  cafish  ofada\n",
       "date       scale                          \n",
       "2017-01-01 regional  1001.2   899.6  377.4\n",
       "2017-02-01 regional  1129.0   916.5  472.3"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index=['date', 'scale'], columns='food', values='price')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4edf51ce",
   "metadata": {},
   "source": [
    "To turn the axes - the bold parts - into normal columns, all you need to do is `reset_index()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2f096aa8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>food</th>\n",
       "      <th>date</th>\n",
       "      <th>scale</th>\n",
       "      <th>beef</th>\n",
       "      <th>cafish</th>\n",
       "      <th>ofada</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-01-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>1001.2</td>\n",
       "      <td>899.6</td>\n",
       "      <td>377.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-02-01</td>\n",
       "      <td>regional</td>\n",
       "      <td>1129.0</td>\n",
       "      <td>916.5</td>\n",
       "      <td>472.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "food        date     scale    beef  cafish  ofada\n",
       "0     2017-01-01  regional  1001.2   899.6  377.4\n",
       "1     2017-02-01  regional  1129.0   916.5  472.3"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index=['date', 'scale'], columns='food', values='price').reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "294ae784",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}