What is Git?

Git is a version control software, which means it records changes to a file or set of files over time so that you can recall specific versions later. Git works by essentially taking snapshots of a file’s contents over time, such that you can return to a old snapshot at some point in the future.


Git vocabulary


repository (repo): folder containing all tracked files as well as the version control history

local: a version of a repository stored on a personal computer

remote: a version of a repository stored on a remote server like GitHub

branch: a parallel version of the files in a repository

clone: download a copy of a remote repository to your personal computer

stage (noun): the staging area holds the files to be included in the next commit
stage (verb): to mark a file to be included in the next commit

commit (noun): a snapshot of changes made to the staged file(s)
commit (verb): save a snapshot of changes made to the staged file(s)

fork (noun): a copy of another user’s repository
fork (verb): to copy someone else’s repository

merge: update files by incorporating the changes from new commits

pull: retrieve commits from a remote repository and merge them into a local repository

push: send commits from a local repository to a remote repository

pull request: a message sent by one GitHub user to another with a request to merge their commits from their remote repository into another user’s remote repository


Git commands

To begin, we will be using Git via the command line, which means having to type out each instruction. Every Git command begins with the word git. If you are running Mac OS, you’ll use Terminal. If you are running Windows, you’ll use the Git Bash app that comes with your installation of Git.

git help
This will bring up the 21 most common Git commands. You can also type git help blah where blah is the name of a specific command to get more specific information.

git config
Short for “configure”, this is how you set up Git for the first time.

git init
This initializes a new Git repository, which you will need to do inside a repository (directory) before Git will recognize it as something to track.

git status
Check the status of your repository. See which files are inside it and which changes still need to be committed. This command also offers helpful hints about possible next steps (e.g., unstaging a commit).

git add
This is a bit misleading as it doesn’t actually add new files to your repository. Rather, it merely alerts Git to start paying attention to a file.

git commit
This is the most important command, as it tells Git to take a snapshot of any changed files in the repository.

git push
This moves (“pushes”) changes from your local repo up to a remote repository like GitHub.

git pull
This moves (“pulls”) changes from a remote repository like GitHub to your local repo.

git branch
This creates a new branch in the current repo.

git checkout
This command has two uses: 1) inspect a new branch, and 2) discard any changes to a local file and revert it back to the way it was at the last commit.

git merge
When you’re done working on a branch, you can merge your changes back to the master branch, which is visible to all collaborators. git merge cats would take all the changes you made to the “cats” branch and add them to the master.


Getting set up

Create a sandbox

We’ll begin by creating a so-called “sandbox” for playing around with the various features of Git. In this case, it is simply a folder (directory).

Task: Using the MacOS Finder or Windows Explorer, create a new folder (directory) on your Desktop and call it tryout.

Set working directory

Task: Open RStudio and set your working directory to the tryout folder (directory) you just created using the command line or menu options.

Option 1: command line

Tip: username below is your user name on your computer.

Mac

> setwd("/Users/username/Desktop/tryout")

Windows

> setwd("C:/Desktop/tryout")


Option 2: RStudio menu



Using the command line

We’ll start by using Git via its standard command line interface, which means you’ll have to do a bit of typing. In the future, you may very well choose to use Git via a graphical client (e.g., GitHub Desktop, GitKraken) or directly within Rstudio, which you’ll see in the next lesson. You might then be asking yourself, “Why do it via the command line?” As you start using Git more in your workflows, there is a very real possibility that you will want some assistance to fix a problem or undo a change. The advice you receive, either from someone you know or via online searches, will very likely involve direct Git syntax.

Although you can use any suitable terminal interface (e.g., Terminal on Mac OS or Windows), we will use the terminal within RStudio itself.

Task: Click on the Terminal tab to switch from the R console to the terminal.


If you haven’t used the Terminal before, the text may look a bit confusing. The information you see is

[computer-name]:[directory] [user-name]$

where [computer-name] is the name of your computer (Marks-MacBook-Pro-4 in the example above), [directory] is the current terminal directory (~ indicates the root directory in the example above), and [user-name] is your user name on your computer (mark in the example above). The $ marks the beginning of the command line where you can start typing commands.

At this point our R working directory and our terminal directory are different, so we need to fix that.

Task: Click on the “Terminal 1” downarrow and select “Go to Current Directory”.


Tip: The terminal will echo the actual command you would use to change directories (cd ~/Desktop/tryout) and the directory has switched from [computer-name]:~ to [computer-name]:tryout.



Configuring Git

Before using Git for version control, you’ll need to configure it to use your name and email address.

Note: The following command line instructions will just begin with a $ instead of the full path [computer-name]:[directory] [user-name]$

The first step is to tell Git who you are.

Task: Set your user name. Replace first last with your first and last names, and be sure to include the quotation marks.

$ git config --global user.name "first last"

The next step is to give Git your email address.

Task: Set your email address. Replace your_email with your actual email address and again be sure to include the quotation marks.

$ git config --global user.email "your_email"

Task: Verify that your user name and email are set correctly.

$ git config --list

Tip: You should see something like the output below with your name and email address. Don’t be alarmed if Git also returns some additional information.

user.name=First Last
user.email=email@uw.edu

Initialize the repo

Before Git will start tracking the changes to files and folders, you need to “initialize” your project folder.

Task: Go ahead and initialize the folder.

Input

$ git init

Output

Initialized empty Git repository in /Users/[user-name]/Desktop/tryout/.git/

Tip: The command git status will report all kinds of information related to the contents of a repo.

Input

$ git status

Output

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)
  Note: Here Git is telling us 3 things of interest:
  1. Our branch is set to main
  2. We have not yet made any commits
  3. There is nothing in the folder to commit

Adding files

Before you can commit a file to the tracking history, you need to “add” it to the list of things for Git to track. To do so, we’ll use git add as Git suggested when we typed git status above, but at the moment our folder is empty so we need to populate it with something.

Task: Open a new R script by selecting File > New File > R Script from the RStudio menu.


Task: Add the following lines of code to your new script.

## a test script
a <- 1
b <- 2


Task: Save your R script to your tryout folder as test_script.R .


Task: Now that we have a new file in our tryout folder, check the status of Git.

Input

$ git status

Output

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    test_script.R

nothing added to commit but untracked files present (use "git add" to track)

Note: Here Git is reporting that test_script.R is “untracked” and it suggests that you use git add to track it.

Task: Track the new file with git add filename, where filename is the name of our R script.

Input

$ git add test_script.R

Output

$

Note: git add is another silent command that doesn’t return anything.

Task: Check the status of our repo.

Input

$ git status

Output

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file:   test_script.R

Note: Git reports back that our new file test_script.R is “staged” and ready to be committed to the Git history.

Other options for git add

So far we’ve added only one file at a time.

  Tip: Here are some options for git add that allow you to add multiple files at once:
  • git add *.R will stage all .R files
  • git add data/ will stage the folder data and its contents
  • git add . will stage all new and modified files, but does not remove any deleted files
  • git add -u will stage all modified and deleted files, but does not stage any new files
  • git add -A stages all new, modified and deleted files

Committing a file

Committing files in Git is the backbone of the whole version control process. To do so, we’ll use git commit with some additional information about what we’re committing via a “commit message”. Commit messages should be “short but informative”, which means they should include enough information to help you and others understand what was done without being verbose.

  Tip: Mark recommends beginning your commit message with a past tense verb. For example,
  • “created new R script for data ingest”
  • “started work on background info for analysis”
  • “added new year of data to samples.csv”

Task: Commit your new R script and don’t forget the quotes (the -m flag stands for “message”).

Input

$ git commit -m "created test R script"

Output

[main (root-commit) 5e40183] created test R script
 1 file changed, 3 insertions(+)
 create mode 100644 test_script.R

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit and its short identifier is 5e40183 (note that your commit will likely have a different identifier).

Task: Check the status of your repo and see where things stand.

Input

$ git status

Output

On branch main
nothing to commit, working tree clean

Now Git is reporting that everything is up to date.

Tip: If you want a report of your recent activity, you can ask Git to show you the project’s history using git log.

Task: Check the log.

Input

$ git log

Output

commit 5e401831a92cf9e6980d0ee78a19966d6b310b78 (HEAD -> main)
Author: First Last <email@domain.edu>
Date:   Fri Jan 13 09:05:06 2023 -0800

    created test R script
  Note: The log lists all commits made to a repository in reverse chronological order, which includes:
  • a full 40-character identifier (i.e., the “SHA-1 checksum”), starting with the same 7 characters as the short identifier printed by the git commit command we used earlier
  • the commit author’s name and email address
  • the date/time stamp of when it was created
  • the commit message passed to Git

Summary

  Success: At this point you’ve seen how to
  • initialize a Git repository
  • add (stage) a file to be committed
  • commit a file to the repository

This figure from Blischak et al. (2018) shows that process graphically.



Making changes to a file

Let’s now imagine you wanted to return to the R script we created earlier and add some more lines of code, or make some changes to the existing code.


Task: Add the following lines of new code to test_script.R and save the file when you are done.

## an operation
a + b


Task: Check on the status of your repo.

Input

$ git status

Output

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   test_script.R

no changes added to commit (use "git add" and/or "git commit -a")

This output indicates that there is one file (test_script.R) that has been modified, but importantly, nothing has yet been added to the staging area. This is what the last line of the output is telling us.

Tip: Before adding a file to staging, it’s a good idea to inspect the changes that were made to it. To do this in Git, we’ll use git diff, which is short for “difference”.

Input

$ git diff

Output

diff --git a/test_script.R b/test_script.R
index 05922e5..4cdf1a7 100644
--- a/test_script.R
+++ b/test_script.R
@@ -1,3 +1,6 @@
 ## a test script
 a <- 1
 b <- 2
+## an operation
+a + b
+

Hmm, this output is definitely a bit cryptic.

  • The first line indicates that the Git output is similar to the Unix diff command comparing the old (a/test_script.R) and new (b/test_script.R) versions of the file.
  • The second line indicates which versions of the file Git is comparing (i.e., 05922e5 and 4cdf1a7 are their unique version-specific labels).
  • The third and fourth lines once again show the name of the file being changed (test_script.R).
  • The remaining lines show the actual differences and the lines on which they occur; the + sign in the first column shows the lines that were added most recently (note that the last line of the R script is blank).

All of this is good information and everything is okay.

Task: Commit the new changes to our R script.

Input

$ git commit -m "added an addition operation to test R script"

Output

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   test_script.R

no changes added to commit (use "git add" and/or "git commit -a")

Note: We forgot to add our file to the staging area, so nothing happened. Git suggests that we either use git add and/or git commit -a.

Task: Use git add to add the file to staging and then commit it.

Input

$ git add test_script.R
$ git commit -m "added an addition operation to test R script"

Output

[main 07eaad4] added an addition operation to test R script
 1 file changed, 3 insertions(+)

OK, it looks like that worked, but it’s a good idea to check on the status of the repo to be sure.

Input

$ git status

Output

On branch main
nothing to commit, working tree clean

Success: Everything seems to be in working order.

Tip: Once again, we can check out the history of what we’ve done so far.

Input

$ git log

Output

commit 07eaad4b2fcbe4be7068c47612dcc4f3f7c6373d (HEAD -> main)
Author: First Last <email@domain.edu>
Date:   Fri Jan 13 09:33:32 2023 -0800

    added an addition operation to test R script

commit 5e401831a92cf9e6980d0ee78a19966d6b310b78
Author: First Last <email@domain.edu>
Date:   Fri Jan 13 09:39:32 2023 -0800

    created test R script

Now we can see both of our commits, with the most recent one at the top.

Task: Add another operation to your R script with the following lines of code and save the file when done. (For now ignore the fact that this code is problematic–we’ll return to that later.)

## another operation
(a + b) / 0


Task: Add this file and commit it.

Input

$ git add test_script.R
$ git commit -m "added a division operation to R test script"

Output

[main 17c1a74] added a division operation to R test script
 1 file changed, 2 insertions(+), 1 deletion(-)

Reviewing file history

One of the major advantages to using a formal version control system like Git is that you can go back in time and examine changes that were made to files. We saw previously that we can use git diff to inspect the changes that were made to a staged file. Here we’ll expand that functionality to look back further in time.


Task: Examine the changes to our R script with git diff.

Input

$ git diff test_script.R

Output

$

Note: In this case there have been no new changes to our file, so the output is blank.

Git refers to the most recent version of a file as its HEAD. Earlier versions of a file are referenced with the tilde ~ and an integer, such that HEAD~1 is the version that immediately precedes the current version. Similarly, HEAD~10 refers to the version 10 steps prior to the current version.

Tip: We can use git diff HEAD filename to inspect changes to the current version, but in this case it will yield the same thing as git diff filename.

Input

$ git diff HEAD test_script.R

Output

$

Task: Take a peek back at the version prior to your current version and compare their differences.

Input

$ git diff HEAD~1 test_script.R

Output

diff --git a/test_script.R b/test_script.R
index 4cdf1a7..faed447 100644
--- a/test_script.R
+++ b/test_script.R
@@ -3,4 +3,5 @@
 ## a test script
 a <- 1
 b <- 2
 ## an operation
 a + b
-
+## another operation
+(a + b) / 0

Here we can see that we deleted the blank line at the end of the script as indicated by the -, and replaced it with the 2 lines beginning with ## another operation.

Task: Go back and look at the changes relative to your first version of the script.

Input

$ git diff HEAD~2 test_script.R

Output

diff --git a/test_script.R b/test_script.R
index 05922e5..faed447 100644
--- a/test_script.R
+++ b/test_script.R
@@ -1,3 +1,7 @@
 ## a test script
 a <- 1
 b <- 2
+## an operation
+a + b
+## another operation
+(a + b) / 0

Here you can see that the first version had only 3 lines of code and since then we’ve added 4 new lines of code.


Recovering an old version

Let’s imagine we weren’t happy with the current version of our test script because perhaps we broke something or we simply can’t get it to work. Because Git is a version control system, we can easily restore files to a state they were in at some previous commit.

Tip: We can use git checkout to restore a previous version of a file by referencing it with HEAD~n where n refers to the version we’d like.

The last operation we added to our file will clearly create some problems for us because it contains a divide-by-zero.

Task: Let’s revert our script to the version just prior to that when everything was working properly.

Input

$ git checkout HEAD~1 test_script.R

Output

Updated 1 path from b478816

Hmm. That did something, but it’s not immediately clear if it was what we wanted.

Tip: You can use cat to inspect a file’s contents and see if it was indeed switched back to the prior version.

Input

$ cat test_script.R

Output

## a test script
a <- 1
b <- 2
## an operation
a + b

Success: Our script is back to the previous working version.

Note: Your open R script in RStudio should have also reverted to the previous version.


Task: Check on the status of the directory.

Input

$ git status

Output

On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   test_script.R

Tip: Once again Git is telling us that we still need to commit the changes to our file, so let’s do that.

Input

$ git commit -m "changed test script back to version before division op"

Output

[main 2743b47] changed test script back to version before division op
 1 file changed, 1 insertion(+), 2 deletions(-)

Task: Lastly, run git status again to make sure we’ve gotten everything cleaned up.

Input

$ git status

Output

On branch main
nothing to commit, working tree clean

Success: Everything seems to be in proper working order!


Git clients

It can be tricky to learn all of the ins and outs of Git, especially when typing a bunch of cryptic text into the command line. Fortunately, there are several graphical user interfaces (GUIs) for Git that help visualize what is being done. We’ll see next time that RStudio offers a relatively simple interface to Git, but there are others that have much more functionality. I suggest reading Jenny Bryan’s treatment of them here.


Endnote

  Success: You’ve now done the following:
  • Initialized a Git repository
  • Staged and added files to be tracked
  • Committed a file to the Git history
  • Compared changes to previous versions of a file
  • Reverted a file back to a previous version