Handling an overplotting on a scatter plot: geom_count()/stat_sum()#
The geom_count() counts the number of observations at each location.
Computed variables:
..n..- number of observations at location..prop..- value in range 0..1 : share of observations at location..proppct..- value in range 0..100 : % of observations at location
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
mpg_df = pd.read_csv ("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
mpg_df.head()
| Unnamed: 0 | manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
| 1 | 2 | audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
| 2 | 3 | audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
| 3 | 4 | audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
| 4 | 5 | audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
p = ggplot(mpg_df, aes(x=as_discrete('class', order=1), y=as_discrete('drv', order=1)))
1. Plot an Observation Count by Location#
p + geom_count()
p + stat_sum()