Download notebook (.ipynb)

Bar Geometry#

import pandas as pd

from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
print(df.shape)
df.head()
(234, 12)
Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact

Default usage:

ggplot(df, aes(x="class")) + geom_bar()

Proportions instead of counts:

ggplot(df, aes(x="class")) + geom_bar(aes(y="..sumprop.."))

Order bars by count:

ggplot(df, aes(x=as_discrete("class", order_by="..count.."))) + geom_bar()

Total engine displacement of each class:

ggplot(df, aes(x="class")) + geom_bar(aes(weight="displ"))

Annotations instead of tooltips:

ggplot(df, aes(x="class")) + geom_bar(labels=layer_labels(["..count.."]), tooltips='none')

Additional grouping by drivetrain (groups are stacked by default):

ggplot(df, aes(x="class")) + geom_bar(aes(fill="drv"))

Change group positioning:

ggplot(df, aes(x="class")) + geom_bar(aes(fill="drv"), position='dodge')

Flip coordinates:

ggplot(df, aes(y="class")) + geom_bar()

The default statistic for bars is 'count'.

The same statistic can be drawn with a different geometry:

ggplot(df, aes(x="class")) + geom_area(stat='count')

Conversely, you can draw bars with alternative statistics, i.e. use the 'identity' statistic to draw averages:

avg_df = df.groupby("class")["cty"].mean().to_frame().reset_index()

ggplot(avg_df, aes("class", "cty")) + geom_bar(stat='identity') + ylab("mean cty")

But you can do the same thing just by using stat_summary() with the bar geometry and no hand calculations:

ggplot(df, aes(as_discrete("class", order=1), "cty")) + stat_summary(geom='bar')

Custom order using the scale-function:

manufacturers = ["audi", "volkswagen",
                 "chevrolet", "dodge", "ford", "jeep", "lincoln", "mercury", "pontiac",
                 "honda", "nissan", "subaru", "toyota",
                 "hyundai",
                 "land rover"]
ggplot(df, aes(x="manufacturer")) + geom_bar() + scale_x_discrete(breaks=manufacturers)

Discrete coordinates correspond to integers starting at 0, and we can use this in other geometries that require numerical values for drawing:

countries = ["Germany", "US", "Japan", "South Korea", "UK"]
colors= ["gold", "palevioletred", "orangered", "mediumaquamarine", "mediumpurple"]
# The coordinates are known due to the fact that the middle of the first bar is at 0, 
# the middle of the second at 1, and so on.
# Bar widths are set to 0.5.
xs = [.5, 5, 10.5, 13, 14]
xmins = [-.5, 1.5, 8.5, 12.5, 13.5]
xmaxs = [1.5, 8.5, 12.5, 13.5, 14.5]

p = ggplot()
for i in range(len(countries)):
    p += geom_band(xmin=xmins[i], xmax=xmaxs[i], fill=colors[i], size=0, alpha=2/3) + \
         geom_text(x=xs[i], y=35, label=countries[i], angle=90)
p + \
    geom_bar(aes(x="manufacturer"), data=df, color="black", fill="white", width=.5) + \
    scale_x_discrete(breaks=manufacturers) + \
    scale_y_continuous(limits=[0, 40])