import pandas as pd
from plotnine import *
df = pd.DataFrame([
{ 'species': 'cat', 'num_animals': 8 },
{ 'species': 'dog', 'num_animals': 22 },
{ 'species': 'rabbit', 'num_animals': 2 },
])
df
| species | num_animals | |
|---|---|---|
| 0 | cat | 8 |
| 1 | dog | 22 |
| 2 | rabbit | 2 |
When we plot, we use geom_bar to plot a bar.
(
ggplot(df)
+ aes(x='species', y='num_animals')
+ geom_bar(stat='identity')
)
<ggplot: (325275832)>
The little extra bit we added was stat='identity'. It means "just plot the y axis like you would expect it to be plotted."
What's the alternative?? Glad you asked!
Aggregate bar chart (median)¶
A lot of the time your data isn't set up like you want it to be set up. Say for example we have a bunch of countries with life expectancies. Want we want to plot is different, though: we want the median life expectancy per continent.
import pandas as pd
from plotnine import *
df = pd.read_csv('countries.csv')
df.head(2)
| country | continent | gdp_per_capita | life_expectancy | population | |
|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 663 | 54.863 | 22856302 |
| 1 | Albania | Europe | 4195 | 74.200 | 3071856 |
Instead of telling geom_bar to plot the identity (the normal actual value), we tell it to plot a summary statistic. Which summary statistic? The median, using numpy's np.median.
import numpy as np
(
ggplot(df)
+ aes(x='continent', y='life_expectancy')
+ geom_bar(stat='summary', fun_y=np.median)
)
<ggplot: (322494702)>
Aggregate bar chart (count)¶
Plotting a bar chart of counts is similar to how we did the median, but we use stat='count' instead of stat='summary'.
df = pd.read_csv('countries.csv')
df.head(2)
| country | continent | gdp_per_capita | life_expectancy | population | |
|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 663 | 54.863 | 22856302 |
| 1 | Albania | Europe | 4195 | 74.200 | 3071856 |
(
ggplot(df)
+ aes(x='continent')
+ geom_bar(stat='count')
)
<ggplot: (324986974)>
Horizontal bar charts¶
To make a horizontal bar graph, you just add coord_flip() to the end of your plot.
df = pd.DataFrame([
{ 'species': 'cat', 'num_animals': 8 },
{ 'species': 'dog', 'num_animals': 22 },
{ 'species': 'rabbit', 'num_animals': 2 },
])
df
| species | num_animals | |
|---|---|---|
| 0 | cat | 8 |
| 1 | dog | 22 |
| 2 | rabbit | 2 |
(
ggplot(df)
+ aes(x='species', y='num_animals')
+ geom_bar(stat='identity')
+ coord_flip()
)
<ggplot: (325331706)>
Sorting your bars¶
By defualt, geom_bar sorts based on alphabetical order. In order to sort your bars in plotnine, you need to use reorder when you're using aes.
(
ggplot(df)
+ aes(x='reorder(species, num_animals)', y='num_animals')
+ geom_bar(stat='identity')
)
<ggplot: (325451499)>
x='reorder(species, num_animals)' means "I want you to use species for the x axis, but I really want you to order it based on the num_animals column."
Reversing your bar order¶
Instead of sorting with num_animals, you're going to sort with -num_animals, which I guess it's the ... subtracted negative version?
(
ggplot(df)
+ aes(x='reorder(species, -num_animals)', y='num_animals')
+ geom_bar(stat='identity')
)
<ggplot: (325532546)>
Stacked bar graph¶
To make a stacked bar graph, you use one of your columns to specify fill, which is the color of the bar. All of the filled bars get stacked next to each other.
df = pd.DataFrame([
{ 'species': 'cat', 'num_animals': 8, 'county': 'Kings' },
{ 'species': 'dog', 'num_animals': 22, 'county': 'Kings' },
{ 'species': 'cat', 'num_animals': 3, 'county': 'Queens' },
{ 'species': 'dog', 'num_animals': 2, 'county': 'Queens' },
])
df
| species | num_animals | county | |
|---|---|---|---|
| 0 | cat | 8 | Kings |
| 1 | dog | 22 | Kings |
| 2 | cat | 3 | Queens |
| 3 | dog | 2 | Queens |
(
ggplot(df)
+ aes(x='species', y='num_animals', fill='county')
+ geom_bar(stat='identity')
)
<ggplot: (322829960)>
Grouped bar graph¶
To make a grouped bar graph, you'll specify a fill color in the aesthetics, then use position='dodge' along with your geom_bar.
df = pd.DataFrame([
{ 'species': 'cat', 'num_animals': 8, 'county': 'Kings' },
{ 'species': 'dog', 'num_animals': 22, 'county': 'Kings' },
{ 'species': 'cat', 'num_animals': 3, 'county': 'Queens' },
{ 'species': 'dog', 'num_animals': 2, 'county': 'Queens' },
])
df
| species | num_animals | county | |
|---|---|---|---|
| 0 | cat | 8 | Kings |
| 1 | dog | 22 | Kings |
| 2 | cat | 3 | Queens |
| 3 | dog | 2 | Queens |
(
ggplot(df)
+ aes(x='species', y='num_animals', fill='county')
+ geom_bar(stat='identity', position='dodge')
)
<ggplot: (324847595)>
Extra spacing¶
You can also use position='dodge2', which adds a little extra space. I think it does something else, too, but nothing I've read really makes sense so I unfortunately can't explain it to you.
(
ggplot(df)
+ aes(x='species', y='num_animals', fill='county')
+ geom_bar(stat='identity', position='dodge2')
)
<ggplot: (322783726)>
100% stacked bar graph¶
For a 100% stacked bar, you'll change the position of geom_bar to fill.
(
ggplot(df)
+ aes(x='species', y='num_animals', fill='county')
+ geom_bar(stat='identity', position='fill')
)
<ggplot: (323090249)>