Car accident rate is directly related to the antigüity of the vehicle. The graph represents car fleet and its age along 10 years. I will replicate it as closely as possible, as weel as providing a visualization alternative.
This graph is from an online newspaper article from “El Motor” (El País). The author talks about the higher mortality rate when older cars are involved in the accident.
The data is not openly available, we should have a look at the source mentioned in the plot: Traffic General Direction. In this case, data needs to be transformed into percentages and organized in a dataframe manually.
data <- data.frame(Año = c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019,
2020),
"0-4 años" = c("22%", "18%", "16%", "15%", "15%", "16%",
"18%", "20%", "21%", "20%"),
"5-9 años" = c("31%", "32%", "31%", "28%", "25%", "21%",
"16%", "14%", "14%", "14%"),
"10-14 años" = c("24%", "26%", "27%", "27%", "27%", "28%",
"28%", "27%", "24%", "22%"),
"Más de 15 años" = c("23%", "25%", "26%", "29%", "33%",
"36%", "37%", "39%", "41%", "44%"),
check.names = FALSE)
mygraph <- data %>% pivot_longer(-"Año",
names_to = "Antigüedad",
values_to = "Valor")
mygraph <- mygraph %>% mutate(Año = as.factor(Año),
Antigüedad = as.factor(Antigüedad),
Valor = as.numeric(str_sub(mygraph$Valor, 1, 2)))
mygraph <- mygraph %>% mutate(Año = fct_reorder(Año, desc(Año)),
Antigüedad = fct_relevel(Antigüedad,
"Más de 15 años",
"10-14 años", "5-9 años",
"0-4 años"))
Here we set x and y variables. We introduce “Antigüedad” to differentiate segments in the bar according to age of the vehicles.
Assigning colour to each level of “Antigüedad” and flip coordinates.
p <- p + scale_fill_manual(values=c(rgb(0.43, 0.89, 0.99), rgb(0, 0.66, 0.91),
rgb(0.02, 0.4, 0.67), rgb(0, 0.2, 0.53))) +
coord_flip()
p
Now, we label the plot, remove both axis labels and background, and organize the legend.
p <- p + labs(title = "El parque automovilístico por antigüedad",
caption = "Fuente: Dirección General de Tráfico. EL MOTOR") +
theme(plot.title = element_text(family = "encodesans", face = "bold",
size = 22),
plot.title.position = "plot",
plot.caption = element_text(family = "encodesans",
color = "gray", hjust = 1,
size = 14,
margin = margin(20,0,0,0)),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_text(family = "chivo mono", size = 20),
axis.ticks.y = element_blank(),
legend.position = "top",
legend.title = element_blank(),
legend.justification = c(-0.36,1),
legend.text = element_text(family = "encodesans", size = 18),
legend.key.size = unit(0.7, "cm"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()
) +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
guides(fill = guide_legend(reverse=TRUE))
p
Once we have the basic structure of the plot, it is time to introduce labels inside each segment of the bars. This is the most tricky part, as we have to bear in mind several aspects:
p <- p + geom_bar_text(aes(label = paste0(`Valor`, "%"), colour = Antigüedad),
position = "stack",
reflow = TRUE,
place = "left",
family = "chivo mono",
size = 20) +
scale_colour_manual(values = c(`0-4 años` = "White",
`5-9 años` = "White",
`10-14 años` = "White",
`Más de 15 años` = "Black"),
guide = "none")
p
To make the graph interactive I have used the package ggiraph(). Because it is very recent and there is not much documentation and users’ feedback, it has some functions that are difficult to use or are not extended a lot.
The idea that ggiraph proposes is to redefine ggplot functions with the suffix _interactive. At the same time, key data has to be provided inside each function in order to ggiraph make the interaction. With this mechanism, ggiraph identifies which parts of the plot have to interact and adds this movement to the plot.
Particularly for this graph, I find the presence f interactivity very important. The way that information inside bars is plotted does not allow to see clearly increases or decreases of percentages all along years (in the case of the two central columns: “5-9 años” and “10 -14 años”). Interactivity allows the reader to select individually these segments and makes them more easy to read.
interactive <- ggplot(mygraph, aes(Año, Valor, fill = Antigüedad,
data_id = Antigüedad)) +
geom_bar_interactive(stat = "identity", aes(`data-id`= Antigüedad),
extra_interactive_params = "data-id") +
scale_fill_manual_interactive(extra_interactive_params = "data-id",
`data-id`= unique(mygraph$Antigüedad),
values=c(rgb(0.43, 0.89, 0.99),
rgb(0, 0.66, 0.91),
rgb(0.02, 0.4, 0.67),
rgb(0, 0.2, 0.53)),
guide = guide_legend_interactive(reverse=TRUE),
data_id = function(breaks) as.character(breaks)) +
coord_flip() +
labs(title = "El parque automovilístico por antigüedad",
caption = "Fuente: Dirección General de Tráfico. EL MOTOR") +
theme(plot.title = element_text(family = "encodesans",
face = "bold",
size = 22),
plot.title.position = "plot",
plot.caption = element_text(family = "encodesans",
color = "darkgray",
face = "bold",
size = 14,
hjust = 1,
margin = margin(20,0,0,0)),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_text(family = "chivo mono", size = 20),
axis.ticks.y = element_blank(),
legend.position = "top",
legend.title = element_blank(),
legend.justification = c(-0.36,1),
legend.text = element_text(family = "encodesans", size = 18),
legend.key.size = unit(0.7, "cm"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()) +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
guides(guide_legend_interactive(reverse=TRUE)) +
geom_bar_text(aes(label = paste0(`Valor`, "%"),
colour = Antigüedad),
position = "stack",
reflow = TRUE,
place = "left",
family = "chivo mono",
size = 20) + # cambiar tipo de letra
scale_colour_manual(values = c(`0-4 años` = "White", `5-9 años` = "White",
`10-14 años` = "White",
`Más de 15 años` = "Black"), guide = "none")
final <- girafe(ggobj = interactive,
width_svg = 10,
height_svg = 6,
options = list(
opts_hover(css = girafe_css("stroke-width: 1; opacity: 1;")),
opts_hover_key(css = girafe_css("stroke-width: 1; opacity: 0.6;")),
opts_hover_inv(css = girafe_css("stroke-width: 1; opacity: 0.1;"))
))
final
The original graph has some aspects to improve that can be solved with another type of graph. Especially, I am trying to improve readability of data and information organisation.
The original graph has some problems that need to be solved. First, and most important, is the data itself used. As it can be visually appreciated, there are several bars which are longer, and others fall short. This is due to a fail in the rounding of data: 2012 and 2016 account for 101%, and 2014 and 2017 only reach 99%.
So, first step is to recalculate data, adjusting roundings properly. I have also taken the opportunity to translate it into English.
alternative <- data.frame(Year =
as.factor(c("2011", "2011", "2011", "2011", "2012", "2012",
"2012", "2012", "2013", "2013", "2013", "2013", "2014",
"2014", "2014", "2014", "2015", "2015", "2015", "2015",
"2016", "2016", "2016", "2016", "2017", "2017", "2017",
"2017", "2018", "2018", "2018", "2018", "2019", "2019",
"2019", "2019", "2020", "2020", "2020", "2020")),
Antiquity =
as.factor(c("0-4 years", "5-9 years", "10-14 years",
"More than 15 years", "0-4 years", "5-9 years",
"10-14 years", "More than 15 years", "0-4 years",
"5-9 years", "10-14 years", "More than 15 years",
"0-4 years", "5-9 years", "10-14 years",
"More than 15 years", "0-4 years", "5-9 years",
"10-14 years", "More than 15 years", "0-4 years",
"5-9 years", "10-14 years", "More than 15 years",
"0-4 years", "5-9 years", "10-14 years",
"More than 15 years", "0-4 years", "5-9 years",
"10-14 years", "More than 15 years", "0-4 years",
"5-9 years", "10-14 years", "More than 15 years",
"0-4 years", "5-9 years", "10-14 years",
"More than 15 years")),
Value = c(22, 31, 24, 23, 18, 32, 26, 24, 16, 31, 27,
26, 15, 28, 27, 30, 15, 25, 27, 33, 16, 20,
28, 36, 19, 16, 28, 37, 20, 14, 27, 39, 21,
14, 24, 41, 20, 14, 22, 44))
Then, I decided to make an alternative that allowed the reader to see the changes all along years, which was one of the objectives that were expressed in the news from where the original graph was taken. For this reason, firstly, I decided to make a simple line graph. I changed axis, so that years are shown in the x axis and the reader can see clearly the tendency that all age groups are following. In addition, I assigned darker colors to older age groups, instead of what the original graph shows (going from dark to bright).
fullgraph <- alternative %>% mutate(Antiquity = fct_relevel(Antiquity,
"0-4 years",
"5-9 years",
"10-14 years",
"More than 15 years")) %>%
ggplot(aes(x = Year, y = Value, group = Antiquity, color = Antiquity)) +
geom_line(linewidth= 1.2) +
labs(title = "Car park by age in Spain") +
theme_minimal() +
theme(
plot.title.position = "plot",
plot.title = element_text(face = "bold"),
legend.position = c(0.25, 0.87),
legend.title = element_blank(),
legend.margin = margin(c(0,8,5,5), "pt"),
axis.title.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_text(face = "bold"),
axis.title.y = element_blank(),
axis.text.y = element_text(face = "bold"),
axis.ticks.y = element_blank(),
legend.background = element_rect(fill = "white")) +
scale_y_continuous(labels = function(x) paste0(x, "%"), limits = c(0, 50)) +
scale_color_manual(values=c(rgb(0.43, 0.89, 0.99), rgb(0, 0.66, 0.91),
rgb(0.02, 0.4, 0.67), rgb(0, 0.2, 0.53))) +
scale_x_discrete(expand = c(0.01, 0.05)) +
guides(color=guide_legend(nrow=2, byrow=TRUE))
fullgraph
I also created facets to plot individually each group of age, in order for the reader to have an individual view of each of the age segments.
alternative2 <- alternative %>% mutate(Antiquity2 = Antiquity)
facetsalternative <- alternative %>% ggplot(aes(x = Year,
y = Value,
group = 1)) +
geom_line(data = alternative2 %>% select(-Antiquity), aes(group = Antiquity2),
color = rgb(1, 0.65, 0.31), size = 0.7, alpha = 0.3) +
ylim(0,50) +
geom_line(aes(color = Antiquity), size = 1.2) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_text(face = "bold"),
axis.title.y = element_blank(),
axis.text.y = element_text(face = "bold"),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line(color = rgb(0.97, 0.97, 0.97)),
panel.grid.minor = element_line(color = rgb(0.97, 0.97, 0.97)),
legend.position = "none"
) +
scale_y_continuous(labels = function(x) paste0(x, "%"), limits = c(0, 50)) +
scale_x_discrete(expand = c(0.04, 0.04)) +
scale_color_manual(values=c(rgb(0.43, 0.89, 0.99), rgb(0, 0.66, 0.91),
rgb(0.02, 0.4, 0.67), rgb(0, 0.2, 0.53))) +
facet_wrap(~factor(Antiquity, levels = c("0-4 years", "5-9 years",
"10-14 years", "More than 15 years")))
facetsalternative
With patchwork it is easy to join both graphs together.
definitive_alternative <- fullgraph / facetsalternative
definitive_alternative
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Martorell (2023, Jan. 13). Data visualization | MSc CSS: Car fleet by antiquity in Spain. Retrieved from https://csslab.uc3m.es/dataviz/projects/2022/100483851/
BibTeX citation
@misc{martorell2023car, author = {Martorell, Maria del Mar Escalas}, title = {Data visualization | MSc CSS: Car fleet by antiquity in Spain}, url = {https://csslab.uc3m.es/dataviz/projects/2022/100483851/}, year = {2023} }