The course’s final goal is to publish a visualization project in our website, where you are reading this tutorial. The website is an R Markdown project that lives in a GitHub repository. This tutorial will help you fork the repository, create a new draft for your project, and submit it using GitHub’s Pull Request workflow.
Source: _tutorials/project/project.Rmd
This tutorial assumes that you have a working installation of R and RStudio. For starters, you can read about the big picture motivation: Why Git? Why GitHub?, and here is a checklist of all the steps required to prepare for this tutorial.
It is also recommended that you read about some Git basics.
This web page is a Distill website, and as such it consists of a series of R Markdown source files that are compiled into HTML. Data projects, like any other software project, greatly benefits from version control, and this website is no exception. Its sources, along with the resulting HTML products, are hosted in (and published from) a GitHub repository. You can access this repository by clicking on the top-right logo.
Collaboration on GitHub, as in any other Git-based platform, is done via Pull Requests. Essentially, you work on a copy of the project, make changes, and then propose them to be included in the original repository. So in the first place, you need to fork the repository, meaning that you get a copy in your personal account that is linked to the original one, so that you can later submit your changes. To do this, just click the fork button in the dataviz repository.
By clicking Create fork, you will be redirected to your new repository. Once there, click the green Code button and copy the HTTPS URL.
Back in RStudio, start the dialog to create a new project.
Now, select a Version Control project, managed by Git, and then paste the repository URL you copied in the last step. Adjust the destination path if required.
If everything is properly configured, RStudio will download a copy of the GitHub’s repository in your personal account to your PC (in Git terms, it will clone the remote repository locally), and the project will be opened.
Now it is time to generate some new content. To create the basic structure of visualization project, first install the distill
R package, and then execute the following code:
# install.packages("distill") # if required
distill::create_post(
title = "The Title of my Article",
collection = paste0("projects/", format(Sys.Date(), "%Y")),
author = "My name",
slug = NIA,
date_prefix = NULL
)
where NIA
should be your student identifier (i.e. the 9-digit identifier of the form 100xxxxxx
). If you inspect RStudio’s Files pane, you will notice that a new directory _projects/<year>/100xxxxxx
has been generated. The Rmd
file with the same name under that directory is now open in the editor pane. Edit the file header as follows:
categories: "<year>"
below the description (substituting <year>
by the current year, obviously).date: "`r Sys.Date()`"
. In this way, the date of the article will be the date of the last compilation, which is convenient.toc: true
to the distill_article
options.The header should look like this:
Also, remove the following part, because this prevents the R code chunks for being displayed in the article, and we want precisely the opposite:
Then save and hit the knit button to compile the document.
As a result, the article _projects/<year>/100xxxxxx/100xxxxxx.html
has been generated.
Now it is time to tell Git what changes we want to incorporate. There is a new Git pane in RStudio in the top-right panel set. Important: you must always select only the files that were modified under your NIA (i.e., the _projects/<year>/100xxxxxx
directory), and nothing else.
In the case above, mark as Staged only the yellow directories, ignoring the other files. Once checked, click the Commit button.
In the new dialogue, check once again that all files under your NIA—and only those— are selected. Then, write a descriptive commit message and hit Commit.
Close the dialogue. Currently, you local copy of the repository contains the changes, but not the remote one on GitHub (the Git pane reads “Your branch is ahead of ‘origin/main’ by 1 commit”). To synchronize the changes, you need to click the Push button.
Finally, you can contribute your article by opening a Pull Request. In your fork, you will see that GitHub notices that your copy is 1 commit ahead of the original repository, and offers you to contribute.
Click the Contribute button to start a new Pull Request.
In this new dialogue,
Project - <your_name>
;Then click Create a draft pull request. Congratulations! You have created your first Pull Request (i.e., something like this), and paved the way to contribute a visualization project for this course.
Usually, Pull Requests contain a more complete version of the final contribution, but here we are learning. So far, we just created a skeleton of the article in “Draft” mode, and we will be adding content gradually.
A Pull Request is a space where maintainers and collaborators work together to shape the final contribution:
Note that, once the Pull Request is open, every commit that the collaborator pushes to their fork is automatically added to the Pull Request. So the cycle repeats until everybody is happy with the result, and the maintainer finally merges the Pull Request.
The workflow is pretty straightforward:
Rmd
file with text, code and images (external images need to be placed under _projects/<year>/100xxxxxx/
, and then referenced by name in the Rmd
)._projects/<year>/100xxxxxx/
.Finally, when the project is finished, please click the “Ready for review” button in the Pull Request (at the bottom, on top of the comment box). Then I will check that everything is ok, ask for minor changes if required, and then merge the Pull Request and your project will be live!
Commit messages are for future-you. Put something descriptive of what you have done, so that, if future-you finds an issue and wants to undo something, it is easy to remember what each commit did. If you want to communicate with me, commit messages are not a good way; instead, the PR has a nice web interface where you can add comments (with Markdown syntax, yay!).
Use level 2 headings (## Title
) as the highest level for headings. In other words, please do not use level 1 headings in your posts (# Title
), because the title of the post is already level 1, so it looks nicer if sections start one level below.
Note that you do not need to add echo=TRUE
to every chunk, because they are shown by default.
It is a good practice to set an initial chunk, right below the YAML header, as the following to ensure that images use the whole width available and are centered:
```{r setup, include=FALSE}
knitr::opts_chunk$set(out.width="100%", fig.align="center")
```
![This is a caption.](my-image.png){width="100%"}
![This is another caption.](external-image.png){.external width="100%"}
In distill
articles, you can set images and tables that span a width larger than the text column. It looks nicer e.g. for images that are very wide. There are a couple of examples in the Gapminder project. You can see how this is done in the Rmd, here and here, just in case you want to use this feature.
Note also the preview=TRUE
option here. Adding this option to a chunk makes the image produced by that chunk as the preview image for your article in the “Projects” gallery.
One of most important best practices is to limit the width of your code. If you write code lines that are very long, they are harder to read. And especially in this case, these long lines will jump the width of the post and will be lost in the margin, outside of the visible area. So in general it is a good idea to break your lines of code at a maximum width. The common practice is to use a maximum width of 80 characters. There is an option in RStudio to show a visual guide in the editor for this limit. You can activate this in Tools > Global Options > Code > Display > Show margin
. There is also a package called styler
that allows you to style your document automatically. After installing the package, if you click “Addins” in the RStudio toolbar, there is a new addin called “Style active file”. This could be a good starting point (although you may not like other styling options that this package applies).
Once the project is finished and merged, you will need to prepare a short presentation. The document format should be a presentation, but I do not particularly care about the file format. If you wish to try an R Markdown presentation, that is great; but a PowerPoint or PDF would be fine too.
Suggested structure:
Be brief, it is 5 minutes, so do not try to explain everything, there will be a post in our web for that.
Do not show all the code, there will be a post in our web for that. Just highlight some small portions of the code that you found specially tricky, challenging, clever…
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ucar (2022, Oct. 5). Data visualization | MSc CSS: Creating and Submitting your Project. Retrieved from https://csslab.uc3m.es/dataviz/tutorials/project/
BibTeX citation
@misc{ucar2022creating, author = {Ucar, Iñaki}, title = {Data visualization | MSc CSS: Creating and Submitting your Project}, url = {https://csslab.uc3m.es/dataviz/tutorials/project/}, year = {2022} }