This project is about a visualization of the ten most listened songs between June and August 2022 in Spotify.
For this project, I decided to select Spotify worldwide data about the most listened songs between June and August from Newtral’s webpage. At first, we can find the codes for the data cleaning, the replication plot and the alternative plot. Then I will go step by step explaining each of the decisions I took during the replication plot and the alternative plot.
Here, I noticed that I needed to create the columns of my variable by pivoting the dataframe so I obtain columns for the date, the ranking position and the name of the song:
#Pivot long the Song
spoti <- spoti %>%
tidyr::pivot_longer(-Date,
names_to = "Song",
values_to = "Ranking")
spoti
# A tibble: 294 × 3
Date Song Ranking
<chr> <chr> <dbl>
1 06-jul As It Was 2
2 06-jul Despechá NA
3 06-jul Me Porto Bonito 4
4 06-jul Pink Venom NA
5 06-jul Quevedo: Bzrp Music Sessions, Vol. 52 NA
6 06-jul Tití Me Preguntó 5
7 07-jul As It Was 2
8 07-jul Despechá NA
9 07-jul Me Porto Bonito 4
10 07-jul Pink Venom NA
# … with 284 more rows
Then, I wanted the Date column to be date type so it’s easier to handle.
#Transform Date column into "date type"
spoti$Date <- paste(spoti$Date, "2022",sep="-")
spoti$Date<- gsub("jul",07,spoti$Date)
spoti$Date<- gsub("ago",08,spoti$Date)
spoti$Date <- dmy(spoti$Date)
spoti
# A tibble: 294 × 3
Date Song Ranking
<date> <chr> <dbl>
1 2022-07-06 As It Was 2
2 2022-07-06 Despechá NA
3 2022-07-06 Me Porto Bonito 4
4 2022-07-06 Pink Venom NA
5 2022-07-06 Quevedo: Bzrp Music Sessions, Vol. 52 NA
6 2022-07-06 Tití Me Preguntó 5
7 2022-07-07 As It Was 2
8 2022-07-07 Despechá NA
9 2022-07-07 Me Porto Bonito 4
10 2022-07-07 Pink Venom NA
# … with 284 more rows
Also, I realized that the font was Roboto, which is in the Google’s font package, so I loaded it as we learnt in the course.
#### First theme settings
theme_set(theme_minimal()) #Set the minimal theme
sysfonts::font_add_google("Roboto") #Adding a google font type
To follow with, I realized that the plot is upside down so I needed to adjust the axis’ scales:
p <- p +
#set the order of y axis guides
scale_y_continuous(trans = "reverse", n.breaks = 10) +
scale_x_date(
#set the date format of the y tickmarks
date_labels = "%d-%b",
breaks = seq(as.Date("2022-07-11"), as.Date("2022-08-25"), by = 6))
p
Then I introduced the title and subtitle and set the format for them. To get the code of the specific color, I used this very useful webpage: https://imagecolorpicker.com/. The size was guessed by playing with the code until I got the same format.
p <- p +
labs(
title="Evolución diaria del ránking en Spotify",
subtitle = "Posición de las seis canciones más escuchadas en la plataforma a nivel mundial") +
theme(plot.title = element_text(face = "bold", size = 16.5)) + # Format of titles
labs(x = NULL, y = NULL, color = NULL) # Remove the labels from the axis
p
Here is the tricky part, the theme settings to make this graph look identically as the original one, step by step. I didn’t write the code at once. I added things while I was advancing with the plot and realizing of needed theme features.
p <- p +
# size and color of the axis components
theme(axis.text.y = element_text(colour = "#a8a8a8", size = 10),
axis.line.y = element_line(size = 0, color = "#eeeeee"),
axis.text.x = element_text(colour = "#a8a8a8", size = 10),
panel.grid.minor.y = element_blank(), # remove some grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position="none") #remove the legend
p
As I removed the legend, I needed to add information to know which song is represented by each visualization mark. I started trying it with annotations but then figured out that ‘geom_text’ was more simple and easy. Once I got the labels, I only needed to adjust the limits so it fitted:
#for the location of the legend annotations
df.labs <- spoti %>%
filter(Date == "2022-08-23")
p <- p +
geom_text(aes(
label=Song, colour = Song), df.labs, hjust=0, nudge_x=0.7) + #Annotations for legend
expand_limits(x= as.Date("2022-9-1")) + #Create space for the annotations
expand_limits(y= 9)
p
Finally, I only needed to fit the colors of the original plot.
p <- p +
scale_color_manual(values=c(
"#8dc7ad", "#5cb689", "#267d59", "#54b182", "#c71e1d", "#4ea07c"))
p
At first, I considered that I could add more information as the color visualization mark was almost not giving information apart of which song is the spanish one. As I didn’t have the resources to get more data from Spotify.
spotig <- spoti %>%
mutate("Genre" = case_when(
endsWith(Song, "52") ~ "Pop",
endsWith(Song, "As It Was") ~ "Pop",
endsWith(Song, "Pink Venom") ~ "K-Pop",
endsWith(Song, "Me Porto Bonito") ~ "Reggaeton/Latino",
endsWith(Song, "Tití Me Preguntó") ~ "Reggaeton/Latino",
endsWith(Song, "Despechá") ~ "Reggaeton/Latino"
))
spotig
# A tibble: 294 × 4
Date Song Ranking Genre
<date> <chr> <dbl> <chr>
1 2022-07-06 As It Was 2 Pop
2 2022-07-06 Despechá NA Reggaeton…
3 2022-07-06 Me Porto Bonito 4 Reggaeton…
4 2022-07-06 Pink Venom NA K-Pop
5 2022-07-06 Quevedo: Bzrp Music Sessions, Vol. 52 NA Pop
6 2022-07-06 Tití Me Preguntó 5 Reggaeton…
7 2022-07-07 As It Was 2 Pop
8 2022-07-07 Despechá NA Reggaeton…
9 2022-07-07 Me Porto Bonito 4 Reggaeton…
10 2022-07-07 Pink Venom NA K-Pop
# … with 284 more rows
Creating an alternative plot that wasn’t like the replication was tough as the best way to visualize a timeline is with a line and the time on the x axis. I decided to use an animation as I realized that the second most common type of timelines were animation with bubbles. Bubbles were okay but I thought that the best visualization mark for the names had to be done with ‘geom_text’ with the name of the song. The time would be represented by each frame of the animation and the Genre by color. The toughest thing to find was how to label the date of each frame until I found the argument ‘frame_time’ between brackets. Finally, I wanted the theme to be similar as the last one, so I chose many similar options. Here is the result:
pa <- ggplot(spotig) +
aes(Song, Ranking) +
geom_text(aes(color = Genre, label = Song)) + # Assign Genre as the color of the text
scale_y_continuous(trans = "reverse", n.breaks = 9) +
expand_limits(x=c(0, 7)) +
labs(
title="Evolución diaria del ránking en Spotify",
subtitle = "Evolución de las 6 canciones más escuchadas mundialmente en verano de 2022",
tag = "Date:
{frame_time}", # To tag the Date variable written
x = NULL, y = NULL
) +
transition_time(Date) + # Date as the variable represented by the animation
ease_aes('linear') + # Make the animation smoother
theme(
plot.title = element_text(face = "bold", size = 16.5),
plot.subtitle = element_text(size = 10.5, colour = "#6f6f6f"),
plot.tag.position = c(0.836,0.65),
plot.tag = element_text(face = "bold", size = 12, hjust = 0),
legend.title = element_text(face = "bold", size = 12, hjust = 0),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "light grey"),
axis.text.x = element_blank()
)
pa
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Segovia (2023, Jan. 23). Data visualization | MSc CSS: Worldwide Top 10 Listened Songs in Spotify. Retrieved from https://csslab.uc3m.es/dataviz/projects/2022/100486421/
BibTeX citation
@misc{segovia2023worldwide, author = {Segovia, Jorge Pascual}, title = {Data visualization | MSc CSS: Worldwide Top 10 Listened Songs in Spotify}, url = {https://csslab.uc3m.es/dataviz/projects/2022/100486421/}, year = {2023} }