Download notebook (.ipynb)

Categorical Data Type#

Categoricals can only take on a limited number of possible values (categories) and
can be sorted according to the custom order of the categories.

To harness Categorical data type in Lets-Plot you can either add a pandas.Categotical variable to
your pandas.DataFrame or annotate any variable in your dataset as Categorical using
Lets-Plot as_discrete() function and the levels parameter.

import pandas as pd

from lets_plot import *
LetsPlot.setup_html()
mpg_df = pd.read_csv ("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
mpg_df.head(4)
Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact

1. Data Type of the “manufacturer” is Unordered Discrete by Default .#

ggplot(mpg_df) + geom_bar(aes(x='manufacturer')) + coord_flip()
# 
# Create a list of categories sorted according to a num. of vehicles in the dataset.
# 

brands_by_count = mpg_df['manufacturer'].value_counts().index.tolist()
brands_by_count
['dodge',
 'toyota',
 'volkswagen',
 'ford',
 'chevrolet',
 'audi',
 'hyundai',
 'subaru',
 'nissan',
 'honda',
 'jeep',
 'pontiac',
 'land rover',
 'mercury',
 'lincoln']

2. First Option: Add a pandas.Categorical Variable#

manufacturer_cat = pd.Categorical(mpg_df['manufacturer'], categories=brands_by_count, ordered=True)
mpg_df['manufacturer_cat'] = manufacturer_cat
ggplot(mpg_df) + \
    geom_bar(aes(x='manufacturer_cat'),
             labels=layer_labels(['..count..']),
             tooltips='none') + \
    coord_flip()

3. Second Option: Annotate “manufacturer” as a Categorical Using as_discrete(levels=..)#

ggplot(mpg_df) + \
    geom_bar(aes(x=as_discrete('manufacturer', levels=brands_by_count)),
             labels=layer_labels(['..count..']),
             tooltips='none') + \
    coord_flip()

4. Faceted Plot with a Categorical as a Facet Variable#

When the facet variable is of Categorical data type, plot facets are ordered according to the order of categories.

ggplot(mpg_df) + \
    geom_pie(aes(fill='drv', size='..sum..')) + \
    facet_wrap(facets='manufacturer_cat', ncol=5, order=0) + \
    scale_size(range=[2, 10]) + \
    guides(size='none') + \
    theme_void()