{
"cells": [
{
"cell_type": "markdown",
"id": "05b12869",
"metadata": {},
"source": [
"# Making a bar graph in plotnine\n",
"\n",
"## Basic bar chart\n",
"\n",
"Making a bar graph in plotnine is a tiny bit more difficult than other charts.\n",
"\n",
"Usually you'll want your data organized up so that each row will be its own bar. In this dataset, each type of animal is going to get its own bar:"
]
},
{
"cell_type": "code",
"execution_count": 145,
"id": "8cc924b7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 146,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" ggplot(df)\n",
" + aes(x='species', y='num_animals')\n",
" + geom_bar(stat='identity')\n",
")"
]
},
{
"cell_type": "markdown",
"id": "944aa586",
"metadata": {},
"source": [
"The little extra bit we added was `stat='identity'`. It means \"just plot the y axis like you would expect it to be plotted.\"\n",
"\n",
"What's the alternative?? Glad you asked!"
]
},
{
"cell_type": "markdown",
"id": "047a3047",
"metadata": {},
"source": [
"## Aggregate bar chart (median)\n",
"\n",
"A lot of the time your data isn't set up like you want it to be set up. Say for example we have a bunch of countries with life expectancies. Want we want to plot is different, though: we want the median life expectancy per continent."
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "fbc6e19e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country
\n",
"
continent
\n",
"
gdp_per_capita
\n",
"
life_expectancy
\n",
"
population
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Afghanistan
\n",
"
Asia
\n",
"
663
\n",
"
54.863
\n",
"
22856302
\n",
"
\n",
"
\n",
"
1
\n",
"
Albania
\n",
"
Europe
\n",
"
4195
\n",
"
74.200
\n",
"
3071856
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country continent gdp_per_capita life_expectancy population\n",
"0 Afghanistan Asia 663 54.863 22856302\n",
"1 Albania Europe 4195 74.200 3071856"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from plotnine import *\n",
"\n",
"df = pd.read_csv('countries.csv')\n",
"df.head(2)"
]
},
{
"cell_type": "markdown",
"id": "34b8782b",
"metadata": {},
"source": [
"Instead of telling `geom_bar` to plot the identity (the normal actual value), we tell it to plot a **summary statistic**. Which summary statistic? The median, using numpy's `np.median`."
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "792f39af",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"(\n",
" ggplot(df)\n",
" + aes(x='continent', y='life_expectancy')\n",
" + geom_bar(stat='summary', fun_y=np.median)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a9f1f383",
"metadata": {},
"source": [
"## Aggregate bar chart (count)\n",
"\n",
"Plotting a bar chart of counts is similar to how we did the median, but we use `stat='count'` instead of `stat='summary'`."
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "fc5b0ffa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" ggplot(df)\n",
" + aes(x='species', y='num_animals')\n",
" + geom_bar(stat='identity')\n",
" + coord_flip()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "87b445f7",
"metadata": {},
"source": [
"## Sorting your bars\n",
"\n",
"By defualt, `geom_bar` sorts based on alphabetical order. In order to sort your bars in plotnine, you need to use `reorder` when you're using `aes`."
]
},
{
"cell_type": "code",
"execution_count": 152,
"id": "8464e2e0",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 152,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" ggplot(df)\n",
" + aes(x='reorder(species, num_animals)', y='num_animals')\n",
" + geom_bar(stat='identity')\n",
")"
]
},
{
"cell_type": "markdown",
"id": "36a4f35e",
"metadata": {},
"source": [
"`x='reorder(species, num_animals)'` means \"I want you to use `species` for the x axis, but I really want you to order it based on the `num_animals` column.\""
]
},
{
"cell_type": "markdown",
"id": "85b989f1",
"metadata": {},
"source": [
"### Reversing your bar order\n",
"\n",
"Instead of sorting with `num_animals`, you're going to sort with `-num_animals`, which I guess it's the ... subtracted negative version?"
]
},
{
"cell_type": "code",
"execution_count": 155,
"id": "0611dcb3",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 155,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" ggplot(df)\n",
" + aes(x='reorder(species, -num_animals)', y='num_animals')\n",
" + geom_bar(stat='identity')\n",
")"
]
},
{
"cell_type": "markdown",
"id": "312165a3",
"metadata": {},
"source": [
"## Stacked bar graph\n",
"\n",
"To make a stacked bar graph, you use one of your columns to specify `fill`, which is the color of the bar. All of the filled bars get stacked next to each other."
]
},
{
"cell_type": "code",
"execution_count": 173,
"id": "9bf4b9c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"