Histogram#

The geom_histogram() geometry visualizes the distribution of a numeric variable using binned counts.

import pandas as pd

from lets_plot import *

LetsPlot.setup_html()

mpg_df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
print(mpg_df.shape)
mpg_df.head()

(234, 12)

	Unnamed: 0	manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
0	1	audi	a4	1.8	1999	4	auto(l5)	f	18	29	p	compact
1	2	audi	a4	1.8	1999	4	manual(m5)	f	21	29	p	compact
2	3	audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
3	4	audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
4	5	audi	a4	2.8	1999	6	auto(l5)	f	16	26	p	compact

iris_df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/iris.csv")
print(iris_df.shape)
iris_df.head()

(150, 5)

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Default Histogram#

mpg_p = ggplot(mpg_df, aes(x="hwy", fill=as_discrete("year"))) + scale_fill_discrete(format="d")

mpg_p + geom_histogram()

Binning#

Fixed Number of Bins#

mpg_p + geom_histogram(bins=10)

Fixed Bin Width#

mpg_p + geom_histogram(binwidth=2)

Custom Bins Using the `breaks` Parameter#

mpg_p + geom_histogram(breaks=[10, 16, 19, 24, 26, 30, 45])

Bin Alignment#

gggrid([
    mpg_p + geom_histogram(binwidth=5) + ggtitle("Default Bin Alignment"),
    mpg_p + geom_histogram(binwidth=5, center=10) + ggtitle("Bins Centered on 10"),
    mpg_p + geom_histogram(binwidth=5, boundary=10) + ggtitle("Bin Boundary at 10"),
], ncol=1)

Position Adjustments#

Interleaved Histogram#

mpg_p + geom_histogram(position='dodge')

Overlaid Semi-Transparent Histogram#

mpg_p + geom_histogram(position='identity', alpha=.5)

Parameter `threshold`#

The threshold parameter controls how the bin statistic is computed. When threshold=None (the default), all bins are retained.

When threshold is 0 or another numeric value, bins with counts less than or equal to that value are dropped, but only if they are on the left or right edge of the histogram.

Dropping empty or low-count edge bins is particularly useful for faceted plots with free scales.

iris_p = ggplot(iris_df, aes(x="petal_length", fill="species")) + facet_wrap(facets="species", scales='free_x')

Default Faceted Histogram with Free X-axis#

Despite scales='free_x', the x-axis still appears to be shared across the plot facets.

iris_p + geom_histogram(alpha=.5)

With a Threshold#

iris_p + geom_histogram(threshold=0, alpha=.5)