This article goes through the steps of recreating a bar chart with small multiples using the ggplot2 package in R. The original chart shows the distributions, by age, with whom we spend our time. The replication and enhanced versions include new data from the year 2020.
I found this chart in an article called “Who We Spend Time with as We Get Older” by Nathan Yau. This article includes data from the American Time Use Survey from 2011 to 2019. Yau did not include 2020 data because they did not include a couple months due to the COVID-19 pandemic.
The chart in Figure 1 stood out because it had a lot of information broken down in static small multiples. The X-axis measures the age, from 15 to 80 years old, of the survey respondents. The Y-axis measures the percentage of respondents for each age group who spent time with the person titled in each bar chart. While the X-axis is fixed, the Y-axis is free so you cannot compare one chart to another.
The point of the chart is to compare the distributions by age. The facets are sorted by peak age from youngest to oldest, meaning that “Parent” has a peak at minimum age. Meanwhile, the peak for “Neighbors or Acquaintances” is at the maximum age of 80, which is why its bar chart is positioned at the bottom right corner.
Comparing the different charts, I noticed that the work-related facets– including co-workers, clients, managers, and customers– have similar distributions and peak around late 20s. The other work-related person, “Person Whom I Supervise,” peaks later but maintains a similar distribution.
The most difficult aspect of putting together the plot was collecting and transforming the data. Instead of using the website that Nathan Yau linked in the article, I went to the U.S. Bureau of Labor Statistics website, which has all the data for the American Time Use Survey.
After combining three datasets downloaded from the BLS.gov site using tidyverse
joins and transformations 1, I exported the data to a csv file. You can see an overview of the data below.
data <- read_csv("data.csv")
data$whofct <- forcats::fct_reorder(
.f = data$who, data$Y, min
)
glimpse(data)
Rows: 6,432
Columns: 9
$ who <chr> "Alone", "Alone", "Alone", "Alone", "Alone", "Alone…
$ TEAGE <dbl> 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18,…
$ category <chr> "Alone", "Family", "Friend", "Work", "Alone", "Fami…
$ n <dbl> 287, 0, 0, 0, 390, 0, 0, 0, 508, 0, 0, 0, 357, 0, 0…
$ N <dbl> 1071, 0, 0, 0, 1391, 0, 0, 0, 1561, 0, 0, 0, 1028, …
$ p <dbl> 0.2679739, 0.0000000, 0.0000000, 0.0000000, 0.28037…
$ X <dbl> 0.7260842, 0.7260842, 0.7260842, 0.7260842, 0.72608…
$ Y <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ whofct <fct> Alone, Alone, Alone, Alone, Alone, Alone, Alone, Al…
To obtain the colors for the plot, I uploaded the screenshot of the plot to an online color generator. These are the codes for the four colors in the original article.
colors <- c("#CCCCCC", "#A8A0D6", "#C4BA81", "#7AB486" )
First step was to set up the ggplot
with the right coordinates and axes.
Second step was to incorporate the small multiples using facet_wrap
and rename the axes.
The code scales = "free"
removes the restriction that all the axes are on the same scale. You can see that the Y-axis has a different scale for each facet. Since the range of the X-axis (variable TEAGE
) in the data goes from 15 to 80 for all the facets (whofct
), this code does not affect the X-axis’s scale.
plt <- plt +
facet_wrap(
~ whofct , scales = "free", ncol = 5,
)
plt
Third step was to add the geom_col
.
plt <- plt +
geom_col(
width = .8,
) +
geom_hline(yintercept = 0, linewidth = .4)
plt
I used theme_void()
to remove all the formatting.
plt <- plt +
theme_void()
The chunk below shows the various formatting code to set the theme. I tried to set the font to be as close to the orignal chart as possible using the available fonts in the system.
Font
Note that strip.clip = "off"
is used to allow the facet titles to go beyond the strict border of the “strip background.” I also added vjust = -.2
to the facet settings so that the titles would be aligned slightly below the strip background and onto the chart, as does the original chart.
plt <- plt +
theme(
# axis settings
axis.text = element_text(
family = "Menlo",
size = 6,
color = "gray30"
),
axis.ticks = element_line(
color = "gray30"
),
axis.ticks.length = unit(-1.0, "mm"),
# set spacing and margins
panel.spacing.x = unit(4, "mm"),
panel.spacing.y = unit(1, "mm"),
plot.margin = margin(4, 4, 4, 4, "mm"),
plot.title.position = "plot",
# strip/facet placement and format
strip.text = element_text(
family = "Menlo",
hjust = 0.3,
vjust = -.2,
size = 10,
face = "bold"
),
strip.clip = "off",
# background color
plot.background = element_rect(fill = "white"),
# remove legend
legend.position = "none",
# format title, subtitle, and caption
plot.title = element_text(
family = "Optima",
size = 18,
hjust = 0.5,
margin = margin(4,4,1,4, "mm")
),
plot.subtitle = element_text(
size = 12,
margin = margin(1,4,4,4, "mm"),
family = "Publico Text",
hjust = 0.5
),
plot.caption = element_text(
family = "Menlo",
color = "gray30",
size = 6,
margin = margin(8,0,0,0, unit = "mm"))
)
plt
I created a table for the annotations with the X
and Y
coordinates, the facet title (whofct
) and category, the label text, and the horizontal justification.
x | y | whofct | label | category | hj |
---|---|---|---|---|---|
20 | 0.30 | Alone | Most people get some alone time during a day. | Alone | 0 |
20 | 0.04 | Grandchild | Grandkids enter the picture. | Family | 0 |
60 | 0.30 | Own household child | Kids enter the picture. | Family | 0 |
85 | 0.04 | Co-Worker | Gotta pay the bills. | Work | 1 |
85 | 0.18 | Parent | While young, we commonly spend time with immediate family. | Family | 1 |
85 | 0.07 | Friends | Time with friends is also most common at a younger age. | Friend | 1 |
I made the whofct
variable a factor so that it is consistent with the same variable of the main dataset.
annotations$whofct <- factor(annotations$whofct)
I added annotations using geom_text
and set the color scale to the variable colors
using scale_fill_manual(...)
.
plt <- plt +
aes() +
geom_text(data = annotations,
aes(x = x, y = y, label = label),
color = "gray10",
size = 3,
family = "Publico Text",
fontface = "italic",
hjust = annotations$hj,
vjust = 1
) +
scale_y_continuous(
label = scales::label_percent(accuracy = .1)
) +
scale_fill_manual(
values = colors
) +
labs(
# TITLES
title = "WHO WE SPEND TIME WITH, BY AGE",
subtitle = "Sorted by peak age from youngest to oldest.",
caption = "Created by Eric Hausken | Data source: American Time Use Survey 2020\n
Chart inspired by: FlowingData",
x = "YEARS OLD"
)
Finally, I saved the chart using ggsave()
to a .png with width and height of 11 and 8 inches.
For the enhanced version, I replaced the color scheme to represent the magnitude of each column in percentage points. Instead of fill = category
, the code now shows fill = p
.
Most of the theme and code is the same for the enhanced version. The differences are in the color scale and legend. Here is the code for those enhancements.
plt_2 <- plt_2 +
labs(
# TITLES
title = "WHO WE SPEND TIME WITH, BY AGE",
subtitle = "Sorted by peak age from youngest to oldest.
Color coded by percentage of age group that spent time with this person.",
caption = "Created by Eric Hausken | Data source: American Time Use Survey 2020\n
Chart inspired by: FlowingData") +
theme(
# legend format
legend.position = c(.95, 0.08),
legend.justification = 1,
legend.direction = "vertical",
legend.title = element_text(size = 8, family = "Menlo"),
legend.text = element_text(size = 6, family = "Menlo" ),
legend.title.align = 1,
legend.key.height = unit(.15, "in"),
) +
# color scale for columns and legend
scale_fill_gradient2(
low = colors[1],
mid = colors[3],
high = colors[4],
midpoint = .1,
guide = guide_colorbar(
title.position = "left",
title = "Percentage\nof Age Group"),
labels = c("0.0%", "20.0%", "40.0%", "60.0%"),
breaks = c(0, .2, .4, .6)
)
Once again, I saved it to a .png file with dimensions 11x8 inches.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hausken-Brates (2024, Jan. 19). Data visualization | MSc CSS: With Whom Did We Spend Our Time in 2020?. Retrieved from https://csslab.uc3m.es/dataviz/projects/2023/100510783/
BibTeX citation
@misc{hausken-brates2024with, author = {Hausken-Brates, Eric}, title = {Data visualization | MSc CSS: With Whom Did We Spend Our Time in 2020?}, url = {https://csslab.uc3m.es/dataviz/projects/2023/100510783/}, year = {2024} }