When constructing a graph, it is often necessary to make annotations to the data displayed. Annotations provide the necessary context to direct the reader’s attention and build an effective story. This metadata, as a form of data, will use the same geoms and tools that we already know. However, there are some helpers, in ggplot2 as well as other extension packages, that may be useful for certain special use cases.
Source: _tutorials/06/06.Rmd
The most simple and useful form of annotation is to set a proper title to the plot and the guides (axes and legend), which can be done in one go by using the labs()
helper. Additionally, the plot subtitle is a good opportunity to further help the interested reader, and the caption is an unobtrusive place to put your data sources.
p <- ggplot(mpg) +
aes(displ, hwy) +
geom_point(aes(color=factor(cyl)))
p +
labs(
x = "Engine displacement [litres]",
y = "Highway mileage [miles/gallon]",
color = "Number of\ncylinders",
title = "Mileage by engine size and cylinders",
subtitle = paste(
"Highway mileage seems to be correlated with engine size,",
"but differences are mostly driven by the number of cylinders",
sep="\n"),
caption = "Source: http://fueleconomy.gov"
)
Note that newline characters \n
must be set to avoid text overflow. For arbitrary text, given a maximum width size in characters, this can be automated:
long_string <- stringi::stri_rand_lipsum(1)
long_string
short_strings <- strwrap(long_string, width=60)
short_strings
# collapse with newlines
p + labs(subtitle = paste(short_strings, collapse="\n"))
The ggtext
package provides improved text rendering capabilities via Markdown and HTML. To enable this functionality, the relevant elements must be set to ggtext::element_markdown()
.
p +
labs(title = "A title with *italics* and **boldface**") +
theme(plot.title = ggtext::element_markdown())
Adding text annotations at particular (x
, y
) coordinates is one of the most common annotation tasks. The main tool for this is the geom_text()
function, which obeys the label
aesthetic.
p + geom_text(aes(label=model))
This geom automatically draws every single label in the dataset, which is rarely appropriate. There is an automated way of checking overlapping labels and remove some of them.
p + geom_text(aes(label=model), check_overlap=TRUE)
However, labeling must be carefully considered to highlight specific data points, such as outliers or other data of interest. To this end, the best approach is to filter the original data to the subset of interest.
annotation <- dplyr::filter(mpg, hwy > 40)
p + geom_text(aes(label=model), annotation)
Usually, proper placement must be adjusted via hjust
and/or vjust
, which is a number between 0 and 1. By default, text is centered (vertically and horizontally), i.e. hjust=0.5
and vjust=0.5
.
p + geom_text(aes(label=model), annotation, hjust=0)
It is possible to go beyond this 0-1 (0-100%) range to set some space between the point and the label.
p + geom_text(aes(label=model), annotation, hjust=-0.2)
But, being a percentage, this approach would set different spaces for label with different lengths. A better approach is to use the nudge_*
argument to add a small space that does not depend on the label length.
p + geom_text(aes(label=model), annotation, hjust=0, nudge_x=0.1)
Even after filtering, in the example above we still have overlapping labels, because there are overlapping outliers. In these situations, the utilities offered by the ggrepel
package come in handy.
p + ggrepel::geom_text_repel(aes(label=model), annotation)
To better distinguish the label from the background, sometimes it is useful to use geom_label()
instead, or the _repel
version. The only difference is that these functions draw an actual label around the text.
p + ggrepel::geom_label_repel(aes(label=model), annotation)
If the result still looks a bit confusing, there are further options to customize ggrepel
’s behavior.
p + ggrepel::geom_label_repel(aes(label=model), annotation,
direction="x", nudge_x=0.1)
Using the mpg
dataset, make a barplot of the number of vehicles per class. Remove the axis and annotate the bars with the counts instead.
Investigate how to do this using the ggfittext
package instead.
Many geoms can be used to make custom annotations, including points, lines, rectangles and other shapes. This, together with the technique of filtering the original data to produce an annotation dataset, or even building an entirely new dataset for this purpose, opens up a world of possibilities.
Specific points may be highlighted by changing the color, size, shape, or stroke.
p + geom_point(data=annotation, shape=4, size=3)
Any line can be used as an annotation, but horizontal, vertical and oblique reference lines that span the entire plot are commonly used to divide regions of the plot. This is achieved via the special geoms geom_hline()
, geom_vline()
and geom_abline()
respectively.
# line to highlight the outliers above y=40
p + geom_hline(yintercept=40, linetype="dashed")
# divide regions approximately by cyl
p + geom_vline(xintercept=c(2.6, 4.1), linetype="dashed")
The division of the panel area into regions can be further stressed by adding slightly colored rectangles that expand towards infinity. In this case, such rectangles should be drawn before the actual data to avoid masking it.
divisions <- c(2.6, 4.1)
df.rect <- data.frame(
xmin = c(-Inf, divisions),
xmax = c(divisions, Inf),
ymin = -Inf,
ymax = Inf,
cyl = c(4, 6, 8)
)
ggplot(mpg) +
aes(fill=factor(cyl)) +
geom_rect(aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),
df.rect, alpha=0.2, show.legend=FALSE) +
geom_point(aes(displ, hwy, color=factor(cyl))) +
theme_bw()
The special function annotate()
allows us to select any geom to make an individual annotation. The arguments should match the aesthetics required by the geom as well as other optional parameters.
p +
annotate("segment", x=2, xend=3, y=unique(annotation$hwy), yend=42) +
annotate("text", x=3, y=42, hjust=0, nudge_x=0.1, label="These are outliers")
Using the mpg
dataset, make a scatterplot of y=hwy
vs. x=displ
, and highlight some points by placing a yellowish shadow under them.
Using the previous plot as the base, investigate how to apply geom_abline()
to trace an oblique dashed line that goes through the coordinates (2, 10) and (3, 30).
Using the previous plot as the base, draw a small rectangle around the points with cyl=5
, put a label somewhere in the empty space of the panel, and connect the rectangle with the label with a curved segment ending in an arrow.
Annotations on maps need to be specified in the same units as the Coordinate Reference System of the underlying geometries. In the following example, a segment is drawn from Ottawa to Melbourne using their approximate longitude/latitude positions:
world <- giscoR::gisco_get_countries()
ottawa <- c(-75, 45)
melbourne <- c(145, -38)
ggplot(world) +
geom_sf() +
annotate("segment", x=ottawa[1], y=ottawa[2],
xend=melbourne[1], yend=melbourne[2])
However, if we switch e.g. to the Robinson projection, it does not work anymore:
ggplot(world) +
geom_sf() +
annotate("segment", x=ottawa[1], y=ottawa[2],
xend=melbourne[1], yend=melbourne[2]) +
coord_sf(crs="+proj=robin")
Instead, we need to transform those coordinates to the new projection beforehand:
ottawa <- sf::st_sfc(sf::st_point(ottawa), crs=4326)
melbourne <- sf::st_sfc(sf::st_point(melbourne), crs=4326)
ottawa <- sf::st_transform(ottawa, "+proj=robin")[[1]]
melbourne <- sf::st_transform(melbourne, "+proj=robin")[[1]]
ggplot(world) +
geom_sf() +
annotate("segment", x=ottawa[1], y=ottawa[2],
xend=melbourne[1], yend=melbourne[2]) +
coord_sf(crs="+proj=robin")
Direct labeling is the art of using annotations, instead of a legend, to identify shapes.
library(dplyr)
df <- ggstream::blockbusters
df.labels <- ggstream::blockbusters %>%
group_by(genre) %>%
arrange(year) %>%
slice(n())
ggplot(df) +
aes(year, box_office, color=genre) +
geom_line() +
geom_text(aes(label=genre), df.labels, hjust=0, nudge_x=0.2) +
expand_limits(x=2023) +
theme(legend.position="none")
The broader ggplot2 ecosystem provide many tools to make this kind of task easier, such as the directlabels
package:
ggplot(df) +
aes(year, box_office, color=genre) +
geom_line() +
directlabels::geom_dl(aes(label=genre), method="last.points") +
expand_limits(x=2023) +
theme(legend.position="none")
The ggforce::geom_mark_*()
functions are useful for annotating groups of points:
ggplot(mpg) +
aes(displ, hwy, color=factor(cyl)) +
geom_point() +
ggforce::geom_mark_ellipse(aes(label=cyl, group=cyl)) +
theme(legend.position="none")
The gghighlight
package makes it easy to highlight filtered data while preserving the context:
ggplot(nlme::Oxboys) +
aes(age, height, group=Subject) +
geom_line() +
geom_point() +
gghighlight::gghighlight(Subject %in% 1:3)
And also in facets:
ggplot(mpg) +
aes(displ, hwy, color=factor(cyl)) +
geom_point() +
facet_wrap(~cyl) +
gghighlight::gghighlight()
Investigate how to use ggforce::geom_mark_rect()
to make an annotation similar to checkpoint 2.3.
Using line plot with the blockbusters dataset, investigate how to annotate the lines as a secondary axis using gghighlight::gghighlight()
.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ucar (2022, Oct. 5). Data visualization | MSc CSS: 06. Annotations. Retrieved from https://csslab.uc3m.es/dataviz/tutorials/06/
BibTeX citation
@misc{ucar202206., author = {Ucar, Iñaki}, title = {Data visualization | MSc CSS: 06. Annotations}, url = {https://csslab.uc3m.es/dataviz/tutorials/06/}, year = {2022} }