Data visualization | MSc CSS: Creating and Submitting your Project

Iñaki Ucar

Creating and Submitting your Project

The course’s final goal is to publish a visualization project in our website, where you are reading this tutorial. The website is an R Markdown project that lives in a GitHub repository. This tutorial will help you fork the repository, create a new draft for your project, and submit it using GitHub’s Pull Request workflow.

Author

Affiliation

Iñaki Ucar

Department of Statistics

Published

Oct. 4, 2022

Citation

Ucar, 2022

Source: _tutorials/project/project.Rmd

Prerequisites

This tutorial assumes that you have a working installation of R and RStudio. For starters, you can read about the big picture motivation: Why Git? Why GitHub?, and here is a checklist of all the steps required to prepare for this tutorial.

Register a free GitHub account if you do not have one already.
Install Git. There are specific instructions for Windows, macOS and Linux.
Introduce yourself to Git.
Configure a personal access token.
Follow this webinar to familiarize yourself with GitHub and RStudio’s interface to Git.

It is also recommended that you read about some Git basics.

Creating the initial skeleton

Forking the source code

This web page is a Distill website, and as such it consists of a series of R Markdown source files that are compiled into HTML. Data projects, like any other software project, greatly benefits from version control, and this website is no exception. Its sources, along with the resulting HTML products, are hosted in (and published from) a GitHub repository. You can access this repository by clicking on the top-right logo.

Collaboration on GitHub, as in any other Git-based platform, is done via Pull Requests. Essentially, you work on a copy of the project, make changes, and then propose them to be included in the original repository. So in the first place, you need to fork the repository, meaning that you get a copy in your personal account that is linked to the original one, so that you can later submit your changes. To do this, just click the fork button in the dataviz repository.

By clicking Create fork, you will be redirected to your new repository. Once there, click the green Code button and copy the HTTPS URL.

Creating a new project

Back in RStudio, start the dialog to create a new project.

Now, select a Version Control project, managed by Git, and then paste the repository URL you copied in the last step. Adjust the destination path if required.

If everything is properly configured, RStudio will download a copy of the GitHub’s repository in your personal account to your PC (in Git terms, it will clone the remote repository locally), and the project will be opened.

Adding your own article

Now it is time to generate some new content. To create the basic structure of visualization project, first install the distill R package, and then execute the following code:

# install.packages("distill") # if required

distill::create_post(
  title = "The Title of my Article",
  collection = paste0("projects/", format(Sys.Date(), "%Y")),
  author = "My name",
  slug = NIA,
  date_prefix = NULL
)

where NIA should be your student identifier (i.e. the 9-digit identifier of the form 100xxxxxx). If you inspect RStudio’s Files pane, you will notice that a new directory _projects/<year>/100xxxxxx has been generated. The Rmd file with the same name under that directory is now open in the editor pane. Edit the file header as follows:

Set a tentative title and description for your project. It does not need to be definitive, so do not take to much time thinking about this right now.
Add categories: "<year>" below the description (substituting <year> by the current year, obviously).
Set your full name.
Set the date as follows: date: "`r Sys.Date()`". In this way, the date of the article will be the date of the last compilation, which is convenient.
Add toc: true to the distill_article options.

The header must look like this:

Also, remove the following part, because this prevents the R code chunks for being displayed in the article, and we want precisely the opposite:

Then save and hit the knit button to compile the document.

As a result, the article _projects/<year>/100xxxxxx/100xxxxxx.html has been generated.

Committing and pushing the changes

Now it is time to tell Git what changes we want to incorporate. There is a new Git pane in RStudio in the top-right panel set. Important: you must always select only the files that were modified under your NIA (i.e., the _projects/<year>/100xxxxxx directory), and nothing else.

In the case above, mark as Staged only the yellow directories, ignoring the other files. Once checked, click the Commit button.

In the new dialogue, check once again that all files under your NIA—and only those— are selected. Then, write a descriptive commit message and hit Commit.

Close the dialogue. Currently, you local copy of the repository contains the changes, but not the remote one on GitHub (the Git pane reads “Your branch is ahead of ‘origin/main’ by 1 commit”). To synchronize the changes, you need to click the Push button.

Contributing your article

Finally, you can contribute your article by opening a Pull Request. In your fork, you will see that GitHub notices that your copy is 1 commit ahead of the original repository, and offers you to contribute.

Click the Contribute button to start a new Pull Request.

In this new dialogue,

set the title to something like Project - <your_name>;
you can write a description;
ensure that “Allow edits by maintainers” is checked;
click the down arrow in the green button to “create a draft pull request”.

Then click Create a draft pull request. Congratulations! You have created your first Pull Request (i.e., something like this), and paved the way to contribute a visualization project for this course.

Working on your project

Usually, Pull Requests contain a more complete version of the final contribution, but here we are learning. So far, we just created a skeleton of the article in “Draft” mode, and we will be adding content gradually.

The Pull Request workflow

A Pull Request is a space where maintainers and collaborators work together to shape the final contribution:

A collaborator commits some changes and asks for feedback.
The maintainer reviews the changes and proposes corrections.

Note that, once the Pull Request is open, every commit that the collaborator pushes to their fork is automatically added to the Pull Request. In other words, you do not need to open a new Pull Request (in fact, you cannot). So the cycle repeats until everybody is happy with the result, and the maintainer finally merges the Pull Request.

Adding changes to the article

The workflow is pretty straightforward:

Modify the Rmd file with text, code and images (external images need to be placed under _projects/<year>/100xxxxxx/, and then referenced by name in the Rmd).
Compile the article by clicking the Knit button.
Mark as Staged in the Git pane all (and only) the files under _projects/<year>/100xxxxxx/.
Commit (with a descriptive commit message) and push.

Finally, when the project is finished, please click the “Ready for review” button in the Pull Request (at the bottom, on top of the comment box). Then I will check that everything is ok, ask for minor changes if required, and then merge the Pull Request and your project will be live!

General guidelines

Project guidelines

Use level 2 headings (## Title) as the highest level for headings. In other words, please do not use level 1 headings in your posts (# Title), because the title of the post is already level 1, so it looks nicer if sections start one level below.
Note that you do not need to add echo=TRUE to every chunk, because they are shown by default. And we want to show the code, so do not add echo=FALSE either unless you have a good reason to hide some special piece of code.
It is a good practice to set an initial chunk, right below the YAML header, as the following to ensure that images use the whole width available and are centered. The fig.showtext=TRUE option is important only if you are using external fonts as described here.
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(out.width="100%", fig.align="center", fig.showtext=TRUE)
```
```
Your first task is to show and discuss the chart you selected. For this, save a copy, put it close to the Rmd file, and then reference it with Markdown syntax as follows:
```
![Original chart. Source: <the source>](original-chart.png){.external width="100%"}
```
Load only the necessary libraries, i.e., the ones that you actually use. This is important because it makes the code cleaner and easier to understand, and prevents readers from installing a lot of packages that they will not use. Also note that, if you load the tidyverse package, a bunch of packages are loaded with it (such as dplyr, ggplot2… see the output from tidyverse::tidyverse_packages() for a complete list), so you do not need to load them separately.
It is a good practice to name your chunks with a descriptive name, so that it is easier to understand what each chunk does. For example, if you are loading the data, you can name the chunk load-data.
```
```{r load-data}
data <- read.csv("mydata.csv")
```
```
About the data, please include the required data (and only the required data) in the folder of your project. If the original dataset contains more than you need, please filter it (only the required rows and columns) and save it. Preferably, it should be a CSV or similar text-based format (if it’s another format, please load the data into R and then save it as CSV). If you need to use a large dataset, please let me know and we can discuss the best way to handle it. Then, you can read the data just using the name of your file, which should be close to the Rmd file. In other words, do not include paths from your own computer, because this is not going to work in everyone else’s computer.
Naming the chunks is especially important for those that produce a chart, because the image file will have the name of the chunk as file name. Note also that chunk names should not contain spaces or special characters.
Chunks that produce a chart should have the fig.width and fig.height options set to the desired size of the image (in inches; default values are 7 and 5 respectively). With these parameters, you can control the aspect ratio and also the relative size of the fonts. Play with them until you find a good balance.
You should not change the fig.dpi. The default is more than enough for a webpage. If your fonts don’t look sharp enough, this can be solved by changing the size of the image (see the previous point).
In distill articles, you can set images and tables that span a width larger than the text column. It looks nicer e.g. for images that are very wide. There are a couple of examples in the Gapminder project. See the documentation for further details.
Note also the preview=TRUE option here. Adding this option to a chunk makes the image produced by that chunk as the preview image for your article in the “Projects” gallery. See the documentation for further details.
Do not repeat yourself. If you find yourself writing the same code in multiple places, it is a good idea to refactor your code, so that, if you need to change something, you only need to change it in one place:
- if the repeated piece of code is some data analysis, then you should write a function than you can apply in several places;
- if it’s a theme, some scales, annotations… that you keep adding to multiple charts, then you should add those layers together into a variable, then add that variable to subsequent charts (this is demonstrated in the Gapminder project).
One of most important best practices is to limit the width of your code. If you write code lines that are very long, they are harder to read. And especially in this case, these long lines will go off the page, and will be lost in the margin, outside of the visible area. So in general it is a good idea to break your lines of code at a maximum width. The common practice is to use a maximum width of 80 characters. There is an option in RStudio to show a visual guide in the editor for this limit. You can activate this in Tools > Global Options > Code > Display > Show margin. There is also a package called styler that allows you to style your document automatically. After installing the package, if you click “Addins” in the RStudio toolbar, there is a new addin called “Style active file”. This could be a good starting point (although you may not like other styling options that this package applies).

Presentation guidelines

Once the project is finished and merged, you will need to prepare a short presentation. The document format should be a presentation, but I do not particularly care about the file format. If you wish to try an R Markdown presentation, that is great; but a PowerPoint or PDF would be fine too.
Suggested structure:
1. ~1 minute to present the selected chart, strengths and weaknesses;
2. ~2 minutes to present your replication work;
3. ~2 minutes to present enhancements and alternatives.
Be brief, it is 5 minutes, so do not try to explain everything, there will be a post in our web for that.
Do not show all the code, there will be a post in our web for that. Just highlight some small portions of the code that you found specially tricky, challenging, clever…

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Ucar (2022, Oct. 5). Data visualization | MSc CSS: Creating and Submitting your Project. Retrieved from https://csslab.uc3m.es/dataviz/tutorials/project/

BibTeX citation

@misc{ucar2022creating,
  author = {Ucar, Iñaki},
  title = {Data visualization | MSc CSS: Creating and Submitting your Project},
  url = {https://csslab.uc3m.es/dataviz/tutorials/project/},
  year = {2022}
}

Creating and Submitting your Project

Author

Affiliation

Published

Citation

Prerequisites

Creating the initial skeleton

Forking the source code

Creating a new project

Adding your own article

Committing and pushing the changes

Contributing your article

Working on your project

The Pull Request workflow

Adding changes to the article

General guidelines

Project guidelines

Presentation guidelines

Footnotes

Reuse

Citation