Git is a version control software, which means it records changes to a file or set of files over time so that you can recall specific versions later. Git works by essentially taking snapshots of a file’s contents over time, such that you can return to a old snapshot at some point in the future.
repository (repo): folder containing all tracked files as well as the version control history
local: a version of a repository stored on a personal computer
remote: a version of a repository stored on a remote server like GitHub
branch: a parallel version of the files in a repository
clone: download a copy of a remote repository to your personal computer
stage (noun): the staging area holds the files to be
included in the next commit
stage (verb): to mark a file to be included in the next
commit
commit (noun): a snapshot of changes made to the
staged file(s)
commit (verb): save a snapshot of changes made to the
staged file(s)
fork (noun): a copy of another user’s
repository
fork (verb): to copy someone else’s repository
merge: update files by incorporating the changes from new commits
pull: retrieve commits from a remote repository and merge them into a local repository
push: send commits from a local repository to a remote repository
pull request: a message sent by one GitHub user to another with a request to merge their commits from their remote repository into another user’s remote repository
To begin, we will be using Git via the command line, which means
having to type out each instruction. Every Git command begins with the
word git
. If you are running Mac OS, you’ll use
Terminal. If you are running Windows, you’ll use the
Git Bash app that comes with your installation of
Git.
git help
This will bring up the 21 most common Git commands. You can also type
git help blah
where blah
is the name of a
specific command to get more specific information.
git config
Short for “configure”, this is how you set up Git for the first
time.
git init
This initializes a new Git repository, which you will need to do inside
a repository (directory) before Git will recognize it as something to
track.
git status
Check the status of your repository. See which files are inside it and
which changes still need to be committed. This command also offers
helpful hints about possible next steps (e.g., unstaging a commit).
git add
This is a bit misleading as it doesn’t actually add new files to your
repository. Rather, it merely alerts Git to start paying attention to a
file.
git commit
This is the most important command, as it tells Git to take a
snapshot of any changed files in the repository.
git push
This moves (“pushes”) changes from your local repo up to a remote
repository like GitHub.
git pull
This moves (“pulls”) changes from a remote repository like GitHub to
your local repo.
git branch
This creates a new branch in the current repo.
git checkout
This command has two uses: 1) inspect a new branch, and 2) discard any
changes to a local file and revert it back to the way it was at the last
commit.
git merge
When you’re done working on a branch, you can merge your changes back to
the master branch, which is visible to all collaborators. git merge cats
would take all the changes you made to the “cats” branch and add them to
the master.
We’ll begin by creating a so-called “sandbox” for playing around with the various features of Git. In this case, it is simply a folder (directory).
Task: Using the MacOS Finder or Windows Explorer,
create a new folder (directory) on your Desktop and call it
tryout
.
Task: Open RStudio and set your working directory to
the tryout
folder (directory) you just created using the
command line or menu options.
Tip: username
below is your user name
on your computer.
Mac
> setwd("/Users/username/Desktop/tryout")
Windows
> setwd("C:/Desktop/tryout")
We’ll start by using Git via its standard command line interface, which means you’ll have to do a bit of typing. In the future, you may very well choose to use Git via a graphical client (e.g., GitHub Desktop, GitKraken) or directly within Rstudio, which you’ll see in the next lesson. You might then be asking yourself, “Why do it via the command line?” As you start using Git more in your workflows, there is a very real possibility that you will want some assistance to fix a problem or undo a change. The advice you receive, either from someone you know or via online searches, will very likely involve direct Git syntax.
Although you can use any suitable terminal interface (e.g., Terminal on Mac OS or Windows), we will use the terminal within RStudio itself.
Task: Click on the Terminal tab to switch from the R console to the terminal.
If you haven’t used the Terminal before, the text may look a bit confusing. The information you see is
[computer-name]:[directory] [user-name]$
where [computer-name]
is the name of your computer
(Marks-MacBook-Pro-4
in the example above),
[directory]
is the current terminal directory
(~
indicates the root directory in the example above), and
[user-name]
is your user name on your computer
(mark
in the example above). The $
marks the
beginning of the command line where you can start typing commands.
At this point our R working directory and our terminal directory are different, so we need to fix that.
Task: Click on the “Terminal 1” downarrow and select “Go to Current Directory”.
Tip: The terminal will echo the actual command you
would use to change directories (cd ~/Desktop/tryout
) and
the directory has switched from [computer-name]:~
to
[computer-name]:tryout
.
Before using Git for version control, you’ll need to configure it to use your name and email address.
Note: The following command line instructions will
just begin with a $
instead of the full path
[computer-name]:[directory] [user-name]$
The first step is to tell Git who you are.
Task: Set your user name. Replace
first last
with your first and last names, and be sure to
include the quotation marks.
$ git config --global user.name "first last"
The next step is to give Git your email address.
Task: Set your email address. Replace
your_email
with your actual email address and again be sure
to include the quotation marks.
$ git config --global user.email "your_email"
Task: Verify that your user name and email are set correctly.
$ git config --list
Tip: You should see something like the output below with your name and email address. Don’t be alarmed if Git also returns some additional information.
user.name=First Last
user.email=email@uw.edu
Before Git will start tracking the changes to files and folders, you need to “initialize” your project folder.
Task: Go ahead and initialize the folder.
Input
$ git init
Output
Initialized empty Git repository in /Users/[user-name]/Desktop/tryout/.git/
Tip: The command git status
will report
all kinds of information related to the contents of a repo.
Input
$ git status
Output
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
main
Before you can commit a file to the tracking history, you need to
“add” it to the list of things for Git to track. To do so, we’ll use
git add
as Git suggested when we typed
git status
above, but at the moment our folder is empty so
we need to populate it with something.
Task: Open a new R script by selecting File > New File > R Script from the RStudio menu.
Task: Add the following lines of code to your new script.
## a test script
a <- 1
b <- 2
Task: Save your R script to your tryout
folder as test_script.R
.
Task: Now that we have a new file in our
tryout
folder, check the status of Git.
Input
$ git status
Output
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test_script.R
nothing added to commit but untracked files present (use "git add" to track)
Note: Here Git is reporting that
test_script.R
is “untracked” and it suggests that you use
git add
to track it.
Task: Track the new file with
git add filename
, where filename
is the name
of our R script.
Input
$ git add test_script.R
Output
$
Note: git add
is another silent command
that doesn’t return anything.
Task: Check the status of our repo.
Input
$ git status
Output
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test_script.R
Note: Git reports back that our new file
test_script.R
is “staged” and ready to be committed to the
Git history.
git add
So far we’ve added only one file at a time.
git add
that allow you to add multiple files at once:
git add *.R
will stage all .R files
git add data/
will stage the folder data
and
its contents
git add .
will stage all new and modified files, but does
not remove any deleted files
git add -u
will stage all modified and deleted files, but
does not stage any new files
git add -A
stages all new, modified and deleted files
Committing files in Git is the backbone of the whole version control
process. To do so, we’ll use git commit
with some
additional information about what we’re committing via a “commit
message”. Commit messages should be “short but informative”, which means
they should include enough information to help you and others understand
what was done without being verbose.
Task: Commit your new R script and don’t forget the
quotes (the -m
flag stands for “message”).
Input
$ git commit -m "created test R script"
Output
[main (root-commit) 5e40183] created test R script
1 file changed, 3 insertions(+)
create mode 100644 test_script.R
When we run git commit
, Git takes everything we have
told it to save by using git add
and stores a copy
permanently inside the special .git
directory. This
permanent copy is called a commit and its short identifier is
5e40183
(note that your commit will likely have a different
identifier).
Task: Check the status of your repo and see where things stand.
Input
$ git status
Output
On branch main
nothing to commit, working tree clean
Now Git is reporting that everything is up to date.
Tip: If you want a report of your recent activity,
you can ask Git to show you the project’s history using
git log
.
Task: Check the log.
Input
$ git log
Output
commit 5e401831a92cf9e6980d0ee78a19966d6b310b78 (HEAD -> main)
Author: First Last <email@domain.edu>
Date: Fri Jan 13 09:05:06 2023 -0800
created test R script
git commit
command we used earlier
This figure from Blischak et al. (2018) shows that process graphically.
Let’s now imagine you wanted to return to the R script we created earlier and add some more lines of code, or make some changes to the existing code.
Task: Add the following lines of new code to
test_script.R
and save the file when you are done.
## an operation
a + b
Task: Check on the status of your repo.
Input
$ git status
Output
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test_script.R
no changes added to commit (use "git add" and/or "git commit -a")
This output indicates that there is one file
(test_script.R
) that has been modified, but importantly,
nothing has yet been added to the staging area. This is what the last
line of the output is telling us.
Tip: Before adding a file to staging, it’s a good
idea to inspect the changes that were made to it. To do this in Git,
we’ll use git diff
, which is short for “difference”.
Input
$ git diff
Output
diff --git a/test_script.R b/test_script.R
index 05922e5..4cdf1a7 100644
--- a/test_script.R
+++ b/test_script.R
@@ -1,3 +1,6 @@
## a test script
a <- 1
b <- 2
+## an operation
+a + b
+
Hmm, this output is definitely a bit cryptic.
diff
command comparing the old
(a/test_script.R
) and new (b/test_script.R
)
versions of the file.05922e5
and 4cdf1a7
are their
unique version-specific labels).test_script.R
).+
sign in the first column shows the
lines that were added most recently (note that the last line of the R
script is blank).All of this is good information and everything is okay.
Task: Commit the new changes to our R script.
Input
$ git commit -m "added an addition operation to test R script"
Output
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test_script.R
no changes added to commit (use "git add" and/or "git commit -a")
Note: We forgot to add our file to the staging area,
so nothing happened. Git suggests that we either use
git add
and/or git commit -a
.
Task: Use git add
to add the file to
staging and then commit it.
Input
$ git add test_script.R
$ git commit -m "added an addition operation to test R script"
Output
[main 07eaad4] added an addition operation to test R script
1 file changed, 3 insertions(+)
OK, it looks like that worked, but it’s a good idea to check on the status of the repo to be sure.
Input
$ git status
Output
On branch main
nothing to commit, working tree clean
Success: Everything seems to be in working order.
Tip: Once again, we can check out the history of what we’ve done so far.
Input
$ git log
Output
commit 07eaad4b2fcbe4be7068c47612dcc4f3f7c6373d (HEAD -> main)
Author: First Last <email@domain.edu>
Date: Fri Jan 13 09:33:32 2023 -0800
added an addition operation to test R script
commit 5e401831a92cf9e6980d0ee78a19966d6b310b78
Author: First Last <email@domain.edu>
Date: Fri Jan 13 09:39:32 2023 -0800
created test R script
Now we can see both of our commits, with the most recent one at the top.
Task: Add another operation to your R script with the following lines of code and save the file when done. (For now ignore the fact that this code is problematic–we’ll return to that later.)
## another operation
(a + b) / 0
Task: Add this file and commit it.
Input
$ git add test_script.R
$ git commit -m "added a division operation to R test script"
Output
[main 17c1a74] added a division operation to R test script
1 file changed, 2 insertions(+), 1 deletion(-)
One of the major advantages to using a formal version control system
like Git is that you can go back in time and examine changes that were
made to files. We saw previously that we can use git diff
to inspect the changes that were made to a staged file. Here we’ll
expand that functionality to look back further in time.
Task: Examine the changes to our R script with
git diff
.
Input
$ git diff test_script.R
Output
$
Note: In this case there have been no new changes to our file, so the output is blank.
Git refers to the most recent version of a file as its
HEAD
. Earlier versions of a file are referenced with the
tilde ~
and an integer, such that HEAD~1
is
the version that immediately precedes the current version. Similarly,
HEAD~10
refers to the version 10 steps prior to the current
version.
Tip: We can use git diff HEAD filename
to inspect changes to the current version, but in this case it will
yield the same thing as git diff filename
.
Input
$ git diff HEAD test_script.R
Output
$
Task: Take a peek back at the version prior to your current version and compare their differences.
Input
$ git diff HEAD~1 test_script.R
Output
diff --git a/test_script.R b/test_script.R
index 4cdf1a7..faed447 100644
--- a/test_script.R
+++ b/test_script.R
@@ -3,4 +3,5 @@
## a test script
a <- 1
b <- 2
## an operation
a + b
-
+## another operation
+(a + b) / 0
Here we can see that we deleted the blank line at the end of the
script as indicated by the -
, and replaced it with the 2
lines beginning with ## another operation
.
Task: Go back and look at the changes relative to your first version of the script.
Input
$ git diff HEAD~2 test_script.R
Output
diff --git a/test_script.R b/test_script.R
index 05922e5..faed447 100644
--- a/test_script.R
+++ b/test_script.R
@@ -1,3 +1,7 @@
## a test script
a <- 1
b <- 2
+## an operation
+a + b
+## another operation
+(a + b) / 0
Here you can see that the first version had only 3 lines of code and since then we’ve added 4 new lines of code.
Let’s imagine we weren’t happy with the current version of our test script because perhaps we broke something or we simply can’t get it to work. Because Git is a version control system, we can easily restore files to a state they were in at some previous commit.
Tip: We can use git checkout
to restore
a previous version of a file by referencing it with HEAD~n
where n
refers to the version we’d like.
The last operation we added to our file will clearly create some problems for us because it contains a divide-by-zero.
Task: Let’s revert our script to the version just prior to that when everything was working properly.
Input
$ git checkout HEAD~1 test_script.R
Output
Updated 1 path from b478816
Hmm. That did something, but it’s not immediately clear if it was what we wanted.
Tip: You can use cat
to inspect a
file’s contents and see if it was indeed switched back to the prior
version.
Input
$ cat test_script.R
Output
## a test script
a <- 1
b <- 2
## an operation
a + b
Success: Our script is back to the previous working version.
Note: Your open R script in RStudio should have also reverted to the previous version.
Task: Check on the status of the directory.
Input
$ git status
Output
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: test_script.R
Tip: Once again Git is telling us that we still need to commit the changes to our file, so let’s do that.
Input
$ git commit -m "changed test script back to version before division op"
Output
[main 2743b47] changed test script back to version before division op
1 file changed, 1 insertion(+), 2 deletions(-)
Task: Lastly, run git status
again to
make sure we’ve gotten everything cleaned up.
Input
$ git status
Output
On branch main
nothing to commit, working tree clean
Success: Everything seems to be in proper working order!
It can be tricky to learn all of the ins and outs of Git, especially when typing a bunch of cryptic text into the command line. Fortunately, there are several graphical user interfaces (GUIs) for Git that help visualize what is being done. We’ll see next time that RStudio offers a relatively simple interface to Git, but there are others that have much more functionality. I suggest reading Jenny Bryan’s treatment of them here.