Using Github
Introduction to Git/Github
When you engage in any kind of data science or programming, there comes a (frustrating) point that you need to understand how Git and GitHub work. Learning how to use Git and GitHub is especially important for keeping versions of your work (think something like Dropbox + MS Word’s Track Changes) and collaborating with others.
Git is essentially a boring time machine. Remember when you worked on a Word file and saved it by adding the date, or calling it “mywork-vesion1”, “mywork-final”, “mywork-final-final”, etc?

Before we go on, let us make the distinction between Git and Github:
- Git is open source software for version control, namely tracking changes in any set of files. Using Git, you can keep track of all previous versions of your work on a project– Git likes to refer to projects as repositories. It is also a useful collaboration tool, as it allows distributed and non-linear workflows, with many parallel branches of a main repository running on different systems.
- GitHub is the most popular service for Git, but htere are other alternatives like GitLab and BitBucket
Git is organised around repositories, ore repos; repos are folders where you keep a project with all necessary files (code, data, images, etc). So you first need to tell git which files/folders to keep track of for any changes you will be making.
As you keep adding code to your project/assignment/etc, you commit changes into your repository and you add an explanatory comment, or message to yourself briefly describing the changes/additions/new work you have done.
When you commit changes, it’s as though you take a snapshot of your work and write a short comment to yourself; it would be the same as saving your Word document adding today’s date in the filename, or v1, v2, final, final-final, etc.
After committing your changes, you need to pull first, so you get the latesr copy from git and then push them to git– this is when you actually upload changes, etc.
Git workflow
The following lists the main steps to create a repository and keep it updated
- Create a repo on GitHub and initialize with a README.
- Clone the repo to your local machine. You can either do it as an RStudio Project, or using a shell command:
$ git clone REPOSITORY-URL - Add or Stage any changes you make:
$ git add -A - Commit your changes:
$ git commit -m "Helpful message to yourself/collaborators" - Pull from GitHub:
$ git pull - Push your changes to GitHub:
$ git push
Repeat steps 3—7, but especially steps 3-4, often.
Git keeps track of all the changes you have made in your repo, just in case you made a mistake and need to go back to an earlier version where things actually worked. GitHub is a website built on top of Git that allows you to collaborate on code with others, in helping with code fixes, documentation, and more.
Further resources
A great introduction to Git comes from from this 2016 talk of FT’s Alice Bartlett: Git for Humans – Alice Bartlett at UX Brighton 2016– you can find the slides here
For R users, Jenny Bryan et al have created Happy Git with R, a brilliant resource that shows you how to use Git and GitHub in RStudio effectively.
One final thing: git can be confusing and frustrating as hell (ask me for details)– add git to the challenges of coding and you sometimes end up with people asking themselves interesting questions.
When things do go wrong (they will), have a look at https://ohshitgit.com/ and http://happygitwithr.com/burn.html