Download notebook (.ipynb)

Handling an overplotting on a scatter plot: geom_count()/stat_sum()#

The geom_count() counts the number of observations at each location.

Computed variables:

  • ..n.. - number of observations at location

  • ..prop.. - value in range 0..1 : share of observations at location

  • ..proppct.. - value in range 0..100 : % of observations at location

import pandas as pd

from lets_plot import *
LetsPlot.setup_html() 
mpg_df = pd.read_csv ("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
mpg_df.head()
Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
p = ggplot(mpg_df, aes(x=as_discrete('class', order=1), y=as_discrete('drv', order=1)))

1. Plot an Observation Count by Location#

p + geom_count()
p + stat_sum()

2. Plot an Observations Share by Location#

p + geom_count(aes(size='..prop..'))

3. Plot an Observations Share by Drivetrain Type within each Vehicle “class”#

Note: group by “class”.

p + geom_count(aes(size='..prop..', group='class'))