import pandas as pd
from plotnine import *
df = pd.DataFrame([
{ 'lang': 'jp', 'word': 'たこ焼き', 'number': 300 },
{ 'lang': 'cn', 'word': '火鍋', 'number': 400 },
{ 'lang': 'ko', 'word': '김치', 'number': 200 },
])
Displaying East Asian text (Chinese, Japanese, Korean) in plotnine¶
When you try to display Chinese, Japanese, or Korean text on a plotnine graphic, you might get an error like this:
/Users/username/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:240: RuntimeWarning: Glyph 12383 missing from current font.
/Users/username/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:240: RuntimeWarning: Glyph 12371 missing from current font.
/Users/username/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:240: RuntimeWarning: Glyph 28988 missing from current font.
This is because many fonts do not support these characters. To display them, you'll need to find a font that works for them.
Using Arial Unicode MS¶
If we have a dataset of all three languages, Arial Unicode MS
is a good option that supports characters around the world. We'll use theme(text=element_text(family='Arial Unicode MS'))
to set all of the text in the graphic to be this font.
df
lang | word | number | |
---|---|---|---|
0 | jp | たこ焼き | 300 |
1 | cn | 火鍋 | 400 |
2 | ko | 김치 | 200 |
(
ggplot(df)
+ aes(x='word', y='number')
+ geom_bar(stat='identity')
+ theme(text=element_text(family='Arial Unicode MS'))
)
<ggplot: (310861300)>
Using ONLY Chinese or Japanese character sets (SimHei, Heiti)¶
Usually you only have one language in your dataset. Instead of using a font that has every single language in it, you might pick something more specific. For Chinese, there are specific fonts on both OS X and Windows that are built-in and work well.
- Windows: SimHei
- OS X: Heiti TC
They won't display Korean correctly, but they'll be fine for Chinese and Japanese.
# I'm on a mac, so I sue Heiti TC
(
ggplot(df)
+ aes(x='word', y='number')
+ geom_bar(stat='identity')
+ theme(text=element_text(family='Heiti TC'))
)
/Users/soma/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:240: RuntimeWarning: Glyph 44608 missing from current font. /Users/soma/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:240: RuntimeWarning: Glyph 52824 missing from current font. /Users/soma/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:203: RuntimeWarning: Glyph 44608 missing from current font. /Users/soma/.pyenv/versions/3.9.7/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:203: RuntimeWarning: Glyph 52824 missing from current font.
<ggplot: (309137818)>
Other fonts and character sets¶
To figure out all of your other options, it's usually best to display a list of supported fonts.