+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization

Chapter 2. The Grammar of Graphs in R

Iñaki Úcar

Department of Statistics | uc3m-Santander Big Data Institute

Master in Computational Social Science

Licensed under Creative Commons Attribution CC BY 4.0 Last generated: 2023-11-14

1 / 52
2 / 52

Building Graphs Layer by Layer

3 / 52

An Object-Oriented Graphics System

Wilkinson, L. (2005) The grammar of graphics. Springer New York.

4 / 52

An Object-Oriented Graphics System

Wilkinson, L. (2005) The grammar of graphics. Springer New York.

  • Graphics are collections of objects that follow a set of rules, a grammar, so that they behave consistently and flexibly.
4 / 52

An Object-Oriented Graphics System

Wilkinson, L. (2005) The grammar of graphics. Springer New York.

  • Graphics are collections of objects that follow a set of rules, a grammar, so that they behave consistently and flexibly.

  • The specification of the formal language is expressed in six statements:

    1. DATA: a set of data operations that create variables from datasets,
    2. TRANS: variable transformations (e.g., rank),
    3. SCALE: scale transformations (e.g., log),
    4. COORD: a coordinate system (e.g., polar),
    5. ELEMENT: marks (e.g., points) and their aesthetic attributes (e.g., color),
    6. GUIDE: one or more guides (axes, legends, etc.).
4 / 52

An Object-Oriented Graphics System

Wilkinson, L. (2005) The grammar of graphics. Springer New York.

  • Graphics are collections of objects that follow a set of rules, a grammar, so that they behave consistently and flexibly.

  • The specification of the formal language is expressed in six statements:

    1. DATA: a set of data operations that create variables from datasets,
    2. TRANS: variable transformations (e.g., rank),
    3. SCALE: scale transformations (e.g., log),
    4. COORD: a coordinate system (e.g., polar),
    5. ELEMENT: marks (e.g., points) and their aesthetic attributes (e.g., color),
    6. GUIDE: one or more guides (axes, legends, etc.).
  • These components link data to (visual) objects and specify a scene containing those.

4 / 52

An Object-Oriented Graphics System

5 / 52

An Object-Oriented Graphics System

6 / 52

About ggplot2

  • An R package for producing statistical graphics

  • Underlying grammar based on the Grammar of Graphics (thus GG)

7 / 52

About ggplot2

  • An R package for producing statistical graphics

  • Underlying grammar based on the Grammar of Graphics (thus GG)

  • Instead of being limited to sets of pre-defined graphics, it allows to compose graphs by combining (adding, +) components

7 / 52

About ggplot2

  • An R package for producing statistical graphics

  • Underlying grammar based on the Grammar of Graphics (thus GG)

  • Instead of being limited to sets of pre-defined graphics, it allows to compose graphs by combining (adding, +) components

  • Simple set of core principles (+ some very few special cases)

  • Carefully chosen defaults

7 / 52

About ggplot2

  • An R package for producing statistical graphics

  • Underlying grammar based on the Grammar of Graphics (thus GG)

  • Instead of being limited to sets of pre-defined graphics, it allows to compose graphs by combining (adding, +) components

  • Simple set of core principles (+ some very few special cases)

  • Carefully chosen defaults

  • Good for quick prototyping, designed to work iteratively

  • But also publication-quality graphics, with a comprehensive theming system

7 / 52

About ggplot2

  • An R package for producing statistical graphics

  • Underlying grammar based on the Grammar of Graphics (thus GG)

  • Instead of being limited to sets of pre-defined graphics, it allows to compose graphs by combining (adding, +) components

  • Simple set of core principles (+ some very few special cases)

  • Carefully chosen defaults

  • Good for quick prototyping, designed to work iteratively

  • But also publication-quality graphics, with a comprehensive theming system

  • Lots of extensions!

7 / 52

ggplot2 Basics

  • Requires tidy data: 1 observation per row, 1 variable per column:
country continent year lifeExp pop gdpPercap
Mauritius Africa 1962 60.246 701016 2529.0675
Indonesia Asia 1957 39.918 90124000 858.9003
Italy Europe 1977 73.480 56059245 14255.9847
8 / 52

ggplot2 Basics

  • Requires tidy data: 1 observation per row, 1 variable per column:
country continent year lifeExp pop gdpPercap
Mauritius Africa 1962 60.246 701016 2529.0675
Indonesia Asia 1957 39.918 90124000 858.9003
Italy Europe 1977 73.480 56059245 14255.9847
  • All plots are composed of data and mapping, the description of how data attributes are mapped to aesthetic attributes (channels).
8 / 52

ggplot2 Basics

  • Requires tidy data: 1 observation per row, 1 variable per column:
country continent year lifeExp pop gdpPercap
Mauritius Africa 1962 60.246 701016 2529.0675
Indonesia Asia 1957 39.918 90124000 858.9003
Italy Europe 1977 73.480 56059245 14255.9847
  • All plots are composed of data and mapping, the description of how data attributes are mapped to aesthetic attributes (channels).

  • Basic workflow:

ggplot(data) + # create the graphic object with the data
aes(x=..., y=..., color=...) + # add the general mapping
... # add more components (geoms, scales, coords, facets, themes...)
8 / 52

ggplot2 Basics

There are five types of components:

9 / 52

ggplot2 Basics

There are five types of components:

  • A layer is a collection of geometric elements (points, lines...) and statistical transformations (binning, counting...).
9 / 52

ggplot2 Basics

There are five types of components:

  • A layer is a collection of geometric elements (points, lines...) and statistical transformations (binning, counting...).

  • A scale controls a channel, adds or modifies how attributes are mapped (position, color, shape, size...).

9 / 52

ggplot2 Basics

There are five types of components:

  • A layer is a collection of geometric elements (points, lines...) and statistical transformations (binning, counting...).

  • A scale controls a channel, adds or modifies how attributes are mapped (position, color, shape, size...).

  • A coordinate system describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines.

9 / 52

ggplot2 Basics

There are five types of components:

  • A layer is a collection of geometric elements (points, lines...) and statistical transformations (binning, counting...).

  • A scale controls a channel, adds or modifies how attributes are mapped (position, color, shape, size...).

  • A coordinate system describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines.

  • A facet specifies how to break up and display subsets of data as small multiples (AKA conditioning, latticing or tresllising).

9 / 52

ggplot2 Basics

There are five types of components:

  • A layer is a collection of geometric elements (points, lines...) and statistical transformations (binning, counting...).

  • A scale controls a channel, adds or modifies how attributes are mapped (position, color, shape, size...).

  • A coordinate system describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines.

  • A facet specifies how to break up and display subsets of data as small multiples (AKA conditioning, latticing or tresllising).

  • A theme controls the finer points of display to create attractive plots (background, fonts, guide aspect and positioning...).

9 / 52

Aesthetics Specification

Read the comprehensive guide on aesthetics.

  • Mastering data mappings is an important (the most important?) skill.
11 / 52

Aesthetics Specification

Read the comprehensive guide on aesthetics.

  • Mastering data mappings is an important (the most important?) skill.
  • Each geom is affected by a different set of aesthetics:

From ?geom_point (required in bold):

  • x
  • y
  • alpha
  • colour
  • fill
  • group
  • shape
  • size
  • stroke

From ?geom_line (required in bold):

  • x
  • y
  • alpha
  • colour
  • group
  • linetype
  • size
11 / 52

Individual Geoms

Geom Result Details
geom_point()
geom_text()
geom_label()
scatterplot Understands shape.
Helper for text.
Helper for labels.
geom_line()
geom_path()
geom_step()
geom_function()
line plot Connects points from left to right, understands linetype.
Connects points in order.
Produces a stairstep plot.
Connects points of a given function ofx.
geom_bar()
geom_col()
bar chart stat="count" by default!
Multiple bars are stacked by default.
geom_area() area plot Line plot filled from 0 to y.
geom_polygon() Filled path.
geom_rect()
geom_tile()
geom_raster()
Rectangle by xmin, xmax, ymin, ymax.
Rectangle by center (x, y) and size (width, height).
Faster tiles with constant size.
12 / 52

Individual Geoms

  • Two dimensional: require x and y, understand color and size.
  • Some of them can be filled.

13 / 52

Collective Geoms

  • Dealing with point overplotting
Geom Result Details
geom_jitter()
geom_count()
geom_bin_2d()
geom_hex()
geom_point(), but adds some jitter to each point.
Maps the count of overlapping points to size.
Maps the count of rectangles to fill.
Same, but using hexagons.

14 / 52

Collective Geoms

  • Dealing with uncertainty
Geom Result Details
geom_pointrange()
geom_linerange()
geom_errorbar()
geom_crossbar()
Various ways of representing a vertical intervals defined by x, ymin and ymax.
geom_ribbon() Special case of geom_area() with ymin too.

15 / 52

Collective Geoms

  • Arbitrary segments
Geom Result Details
geom_segment()
geom_curve()
geom_spoke()
Straight line between points (x, y) and (xend, yend).
Same, but curved line.
Polar parametrization of geom_segment().

16 / 52

Collective Geoms

  • Distributions
Geom Result Details
geom_histogram()
geom_freqpoly()
geom_dotplot()
histogram Distribution of a continuous variable by bins.
To display the counts with lines instead.
Histograms of stacked dots.
geom_density() density plot Smoothed version of the histogram.
geom_rug() Draws ticks for marginal distributions.

17 / 52

Collective Geoms

  • Boxplots
Geom Result Details
geom_boxplot()
geom_violin()
boxplot Compact display of the distribution of a continuous variable.
Mirrored density, displayed as a boxplot.

18 / 52

Collective Geoms

  • Smoothing lines
Geom Result Details
geom_smooth()
geom_quantile()
Fits a model and draws a smoothing line.
Fits a quantile regression and draws the quantiles.

19 / 52

Collective Geoms

  • Contours
Geom Result Details
geom_contour()
geom_contour_filled()
geom_density_2d
geom_density_2d_filled()
contour plot 2D contours of 3D surfaces of regular x, y.
Filled version.
2D contours after computing the density.
Filled version.

20 / 52

Collective Geoms

  • Maps
Geom Result Details
geom_map()
geom_sf()
geom_sf_text()
geom_sf_label()
map Old way to plot polygons as a map.
Current recommended way via sf.
Similar to geom_text() but for sf.
Similar to geom_label() but for sf.

21 / 52

Geom vs. Stat

ggplot(mpg, aes(displ, hwy)) +
geom_point(stat="identity")

ggplot(mpg, aes(displ, hwy)) +
stat_identity(geom="point")

22 / 52

Geom vs. Stat

ggplot(mpg, aes(hwy)) +
geom_bar(stat="count")

ggplot(mpg, aes(hwy)) +
stat_count(geom="bar")

23 / 52

Geom vs. Stat

ggplot(mpg, aes(displ, hwy)) +
geom_smooth(stat="smooth")

ggplot(mpg, aes(displ, hwy)) +
stat_smooth(geom="smooth")

24 / 52

Scales and Guides

25 / 52

Scale Specification

A scale is a procedure that performs the mapping of data attributes into channels (position, color, size...):

  • sets the limits;

  • sets an optional transformation (without modifying the data);

  • sets a guide.

26 / 52

Scale Specification

A scale is a procedure that performs the mapping of data attributes into channels (position, color, size...):

  • sets the limits;

  • sets an optional transformation (without modifying the data);

  • sets a guide.

A guide allows us to revert the procedure and recover the data:

  • an axis or a legend, depending on the channel;

  • has a name, breaks, labels...

26 / 52

Scale Specification

Naming: scale_<aes>_<type>(<arguments>)

Element Argument Shortcut function
Title name=... labs(x=..., y=..., color=..., ...)
Limits limits=... lims(x=..., y=..., color=..., ...)
Breaks breaks=...
Labels labels=...
Guide guide=... guides(x=..., y=..., color=..., ...)
Transformation trans=...
27 / 52

Tutorial 02

Scales and Guides

28 / 52

Coordinate Systems

29 / 52

Cartesian Coordinates

coord_cartesian(): default, no need to be specified

  • ... although it is useful to set axes limits (via xlim and ylim arguments).

  • Position given by orthogonal distances, x and y, to an origin.

30 / 52

Cartesian Coordinates

coord_cartesian(): default, no need to be specified

  • ... although it is useful to set axes limits (via xlim and ylim arguments).

  • Position given by orthogonal distances, x and y, to an origin.

Some helper functions:

  • coord_flip(): helper to flip the axes.

  • coord_fixed(): helper to fix the aspect ratio.

  • coord_trans(): helper to transform the axes.

30 / 52

Other Coordinates

  • coord_polar(): x is the angle, y is the radius (can be reverted).

  • coord_map(): projections of the sphere into a plane.

    • Mercator, sinusoidal, cylindrical, rectangular...
    • Anything supported by the mapproj package.
  • coord_sf(): modern way to deal with maps via simple features
    (from sf package).

31 / 52

Tutorial 03

Coordinate Systems

32 / 52

Faceting

33 / 52

Facet Specification

34 / 52

Facet Specification

35 / 52

Tutorial 04

Faceting

36 / 52

Themes

37 / 52

Theme Specification

Source: ggplot2 Theme Elements Demonstration by Henry Wang

38 / 52

Tutorial 05

Themes

39 / 52

Annotations

40 / 52

Types of Annotations

41 / 52

Types of Annotations

  • Guides (axes and legend)

42 / 52

Types of Annotations

  • Guides (axes and legend)

  • Titles (title, subtitle and caption)

43 / 52

Types of Annotations

  • Guides (axes and legend)

  • Titles (title, subtitle and caption)

  • Text labels

44 / 52

Types of Annotations

  • Guides (axes and legend)

  • Titles (title, subtitle and caption)

  • Text labels

  • Reference lines

45 / 52

Types of Annotations

  • Guides (axes and legend)

  • Titles (title, subtitle and caption)

  • Text labels

  • Reference lines

  • Reference areas

46 / 52

Types of Annotations

  • Guides (axes and legend)

  • Titles (title, subtitle and caption)

  • Text labels

  • Reference lines

  • Reference areas

  • Direct labeling

47 / 52

Tutorial 06

Annotations

48 / 52

Arranging Plots

49 / 52

Types of Arrangements

  • Compositions

  • Insets

50 / 52

Panel Alignment

  • None

51 / 52

Tutorial 07

Arranging Plots

52 / 52
2 / 52
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
Esc Back to slideshow