Download notebook (.ipynb)

Histogram#

The geom_histogram() geometry visualizes the distribution of a numeric variable using binned counts.

import pandas as pd

from lets_plot import *
LetsPlot.setup_html()
mpg_df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
print(mpg_df.shape)
mpg_df.head()
(234, 12)
Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
iris_df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/iris.csv")
print(iris_df.shape)
iris_df.head()
(150, 5)
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Default Histogram#

mpg_p = ggplot(mpg_df, aes(x="hwy", fill=as_discrete("year"))) + scale_fill_discrete(format="d")
mpg_p + geom_histogram()

Binning#

Fixed Number of Bins#

mpg_p + geom_histogram(bins=10)

Fixed Bin Width#

mpg_p + geom_histogram(binwidth=2)

Custom Bins Using the breaks Parameter#

mpg_p + geom_histogram(breaks=[10, 16, 19, 24, 26, 30, 45])

Bin Alignment#

gggrid([
    mpg_p + geom_histogram(binwidth=5) + ggtitle("Default Bin Alignment"),
    mpg_p + geom_histogram(binwidth=5, center=10) + ggtitle("Bins Centered on 10"),
    mpg_p + geom_histogram(binwidth=5, boundary=10) + ggtitle("Bin Boundary at 10"),
], ncol=1)

Position Adjustments#

Interleaved Histogram#

mpg_p + geom_histogram(position='dodge')

Overlaid Semi-Transparent Histogram#

mpg_p + geom_histogram(position='identity', alpha=.5)

Parameter threshold#

The threshold parameter controls how the bin statistic is computed. When threshold=None (the default), all bins are retained.

When threshold is 0 or another numeric value, bins with counts less than or equal to that value are dropped, but only if they are on the left or right edge of the histogram.

Dropping empty or low-count edge bins is particularly useful for faceted plots with free scales.

iris_p = ggplot(iris_df, aes(x="petal_length", fill="species")) + facet_wrap(facets="species", scales='free_x')

Default Faceted Histogram with Free X-axis#

Despite scales='free_x', the x-axis still appears to be shared across the plot facets.

iris_p + geom_histogram(alpha=.5)

With a Threshold#

iris_p + geom_histogram(threshold=0, alpha=.5)