Download notebook (.ipynb)

Sina Plot#

A sina plot visualizes a single variable across classes, with jitter width reflecting the data’s density in each class.

import pandas as pd

from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv")
print(df.shape)
df.head()
(234, 12)
Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact

Default View#

g = ggplot(df, aes("drv", "hwy"))
g + geom_sina(seed=42)

When to Use#

gggrid([
    g + geom_boxplot() + ggtitle("geom_boxplot()", "Show distribution but not sample size"),
    g + geom_violin() + ggtitle("geom_violin()", "Show distribution but not sample size"),
    g + geom_jitter(seed=42) + ggtitle("geom_jitter()", "Show sample size but not distribution"),
    g + geom_sina(seed=42) + ggtitle("geom_sina()", "Show both distribution and sample size"),
], ncol=2)

Applying Jitter Position#

Sometimes vertically adjusting points might be desirable:

  • overlapping values, where multiple observations share the exact same y-value;

  • integerish banding, where values are close to integers and appear artificially grouped into horizontal bands.

In these cases, you may consider using a position adjustment.

gggrid([
    g + geom_sina(seed=42) + ggtitle("Default position"),
    g + geom_sina(seed=42, position=position_jitter(width=0, seed=42)) + ggtitle("'jitter' position"),
])

Use the 'jitterdodge' position adjustment if additional grouping is required:

gggrid([
    g + geom_sina(aes(color=as_discrete("year")), seed=42) + \
        scale_color_discrete(format="d") + \
        ggtitle("Default position"),
    g + geom_sina(aes(color=as_discrete("year")), seed=42,
                  position=position_jitterdodge(jitter_width=0, seed=42)) + \
        scale_color_discrete(format="d") + \
        ggtitle("'jitterdodge' position"),
])

Connection with Violins#

In a sina plot, points are randomly positioned within a violin plot using the same parameters.

Same Shape#

g + \
    geom_violin(bw=1.5) + \
    geom_sina(bw=1.5, seed=42)

Same Quantiles#

g + \
    geom_violin(aes(color='..quantile..', fill='..quantile..'), alpha=.5) + \
    geom_sina(aes(color='..quantile..'), size=2, seed=42) + \
    scale_continuous(['color', 'fill'], low="#1a9641", high="#d7191c")

Same scale Values#

gggrid([
    g + \
        geom_violin(scale='width') + \
        geom_sina(scale='width', size=1.5, seed=42) + \
        ggtitle("scale='width'"),
    g + \
        geom_violin(scale='area') + \
        geom_sina(scale='area', size=1.5, seed=42) + \
        ggtitle("scale='area'"),
    g + \
        geom_violin(scale='count') + \
        geom_sina(scale='count', size=1.5, seed=42) + \
        ggtitle("scale='count'"),
])

Compatible Stats#

gggrid([
    g + geom_violin() + ggtitle("Violin\nstat='ydensity' (default)"),
    g + geom_violin(stat='sina') + ggtitle("Violin\nstat='sina'"),
    g + geom_sina(size=1.5, seed=42, stat='ydensity') + ggtitle("Sina\nstat='ydensity'"),
    g + geom_sina(size=1.5, seed=42) + ggtitle("Sina\nstat='sina' (default)"),
], ncol=2)

show_half Parameter#

g + \
    geom_violin(show_half=-1, size=0, fill="gray85") + \
    geom_sina(show_half=1, seed=42)

Raincloud Plot#

g + \
    geom_violin(aes(fill="drv"), show_half=1, size=0, position=position_nudge(x=.07)) + \
    geom_boxplot(aes(fill="drv"), color="white", width=.1, outlier_alpha=0, show_legend=False) + \
    geom_sina(aes(color="drv"), show_half=-1, seed=42,
              position=position_nudge(x=-.07), show_legend=False) + \
    scale_color_brewer(palette="Set2") + \
    scale_fill_brewer(palette="Pastel2") + \
    facet_grid(x="year", x_format="d") + \
    coord_flip() + \
    theme_light() + flavor_solarized_dark()