Getting a Git repository

You typically obtain a Git repository in one of two ways:

  • You can take a local directory that is currently not under version control, and turn it into a Git repository, or
  • You can clone an existing Git repository from elsewhere.

First, we'll cover how to start a repository from scratch.

Initializing a Repository from scratch

If you have a project directory that is currently not under version control and you want to start controlling it with Git, you first need to go to that project’s directory. For example, let's create one

> mkdir project_tmp

Then, you can initialize git by going to that directory and using git init:

> cd project_tmp
> git init

This creates a new subdirectory named .git that contains all of your necessary repository files (you can see it if you do ls -a). If you want to start version-controlling any files, you should begin tracking those files and do an initial commit:

> echo 'print("Hello World")' >> prova.py
> git add prova.py
> git commit -m "initial project version"

We will see shortly what do these commands do.

Recording Changes to the Repository

At this point, you should have a working Git repository on your local machine, and a checkout or working copy of all of its files in front of you. Typically, you’ll want to start making changes and committing snapshots of those changes into your repository each time the project reaches a state you want to record. To see all files tracked by git you can use git ls-files .. Let's start by making some changes to some file:

> echo 'print("Hello again")' >> prova.py

Three Git's basic commands

If the file that you changed was tracked by git, then you should be able to see that the file is modified using:

> git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   prova.py

no changes added to commit (use "git add" and/or "git commit -a")

If this edit is important, you can mark it in order to be part of the next commit using:

> git add prova.py
> git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   prova.py

By using git status again, you should see that the modified file is staged for the next commit.

Once you are satisfied with your edits, you can make these changes part of the repository by committing them:

> git commit -m "added a second greeting for politeness"
> git status
On branch main
nothing to commit, working tree clean

By running git status again, you will now see that there are no changes to the repository. The edited files are now "part" of the current snapshot and they are not considered changed anymore. It is important to understand that, in order to be useful and interoperable, commits should be focused on a specific topic and contain as few edits as possible. This will make it easy to find, inspect, move, delete or edit any commit from the repository history.

Three git areas

Let's try to understand what just happened here. Git has three main states that your files can reside in: modified, staged, and committed:

  • Modified means that you have changed the file but have not committed it to your database yet.
  • Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
  • Committed means that the data is safely stored in your local database.

This leads us to the three main sections of a Git project:

  • the working tree: a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
  • the staging area: a file, generally contained in your Git directory, that stores information about what will go into your next commit.
  • the Git directory: where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
Git areas
Figure 1. Git areas. Source: https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F

[!EXERCISE]

Do some changes to one of your files by repeating all these steps (edit + stage + commit) in order to get confortable with these commands.

example

Add these lines to prova.py

print("Hello World")
print("Hello again")

a = 2.3
b = 7.01
print(f"{a=}, {b=}")
print(f"a^b = {a**b:.4f}")

and then commit again with git add prova.py; git commit -m "added some multiplications and prints"

Now, add these lines to prova.py again:

print("Hello World")
print("Hello again")

a = 2.3
b = 7.01
print(f"{a=}, {b=}")
print(f"a^b = {a**b:.4f}")

print("program is exiting. Goodbye!")

and then commit again with git add prova.py; git commit -m "added exit message"

Viewing the Commit History

Now, what if you want to inspect some of the changes done in one of your commits? After you have created several commits, or if you have cloned a repository with an existing commit history, you’ll probably want to look back to see what has happened. The most basic and powerful tool to do this is the git log command.

> git log
commit 08a9aa3df553b1c3e05174831115e96ac4892c5e (HEAD -> main)
Author: scarpma <scarpma@gmail.com>
Date:   Thu Jun 5 15:29:54 2025 +0200

    added exit message

commit 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
Author: scarpma <scarpma@gmail.com>
Date:   Thu Jun 5 15:29:07 2025 +0200

    added some multiplications and prints

commit 3506b463daff54b2e0bcd76ba1023bf567e1129e
Author: scarpma <scarpma@gmail.com>
Date:   Thu Jun 5 15:22:22 2025 +0200

    added a second greeting for politeness

commit 1d8cbc066272c9e91014357e345b0a93c56a6615
Author: scarpma <scarpma@gmail.com>
Date:   Thu Jun 5 15:16:48 2025 +0200

    initial project version

It lists all commits in reverse chronological order (first is last) on the current branch and prints some information:

  • commit hash,
  • commit author,
  • commit date,
  • commit message.

[!NOTE]

Additionally, you can see that in the first listed commit has a reference decorator containing "(HEAD -> main)". It means that HEAD is pointing at that commit, which is also the main branch. More specifically,

  1. HEAD refers to the current commit checked out in your working directory
  2. HEAD->main means that your current commit coincides with the main branch of your repository
  3. these two references (HEAD and main) point to the same commit (snapshot of the repository)

Undoing Things

At any stage, you may want to undo something. Here, we’ll review a few basic tools for undoing changes that you’ve made. We'll cover more powerful tools in the next course with Git branching. Be careful, because you can’t always undo some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.

Inspect commit content

The first thing you can do with commits is inspect them. git show is the right tool for you. The argument it requires is, for example, a commit id, i.e. a reference for the commit that you want to inspect. In Git, commits are referenced mainly by the commit hash, i.e. the long sequence of 40-character hexadecimal number displayed by the git log command. Each part of the hash is directly derived from the content it represents, making every hash unique to its commit.

> git show 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
commit 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
Author: scarpma <scarpma@gmail.com>
Date:   Thu Jun 5 15:29:07 2025 +0200

    added some multiplications and prints

diff --git a/prova.py b/prova.py
index d22ee64..a75df96 100644
--- a/prova.py
+++ b/prova.py
@@ -1,2 +1,7 @@
 print("Hello World")
 print("Hello again")
+
+a = 2.3
+b = 7.01
+print(f"{a=}, {b=}")
+print(f"a^b = {a**b:.4f}")

[!CURIOSITY]

Why are Git hashes important? At its core, the Git version control system is a filesystem. It uses the SHA-1 hash function to name content. For example, files, directories, and revisions are referred to by hash values unlike in other traditional version control systems where files or versions are referred to via sequential numbers. The use of a hash function to address its content delivers a few advantages:

  • Integrity checking is easy. If even a single byte in your repository changes, the resulting hash will change
  • Uniqueness: every commit in Git can be accessed via its unique hash. If two snapshots are identical, their hash will be the same
  • Lookup of objects is fast

Using a cryptographically secure hash function brings additional advantages:

  • Object names can be signed and third parties can trust the hash to address the signed object

git commit --amend

One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message. If you want to redo that commit, make the additional changes you forgot, stage them, and commit again using the --amend option git commit --amend.

This command takes your staging area and adds it to the last commit. If you’ve made no changes since your last commit (for instance, you run this command immediately after your previous commit), then your snapshot will look exactly the same, and all you’ll change is your commit message. The same commit-message editor fires up, but it already contains the message of your previous commit. You can edit the message the same as always, but it overwrites your previous commit.

As an example, if you commit and then realize you forgot to stage the changes in a file you wanted to add to this commit, you can do something like this:

> echo 'print("See you soon!")' >> prova.py
> git add prova.py
> git commit --amend

[!NOTE]

It’s important to understand that when you’re amending your last commit, you’re not so much fixing it as replacing it entirely with a new, improved commit. Effectively, it’s as if the previous commit never happened, and it won’t show up in your repository history.

The obvious value to amending commits is to make minor improvements to your last commit, without cluttering your repository history with commit messages of the form, “Oops, forgot to add a file” or “Darn, fixing a typo in last commit”.

Only amend commits that are still local and have not been pushed somewhere. Amending previously pushed commits and force pushing the branch will cause problems for your collaborators.

Unstaging a staged file

If you used git add <file> unintentionally, you may want to remove a file change from the stage area. Fortunately, the command you use to determine the state of the working and stage areas (git status) also reminds you how to undo changes to them:

> touch tmp
> git add tmp
> git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   tmp

Right below the “Changes to be committed” text, it says use git restore --staged <file>... to unstage. By using git restore --staged tmp, the tmp file remains changed, but returns unstaged.

[!NOTE] If you are using older versions of Git, probably Git is reminding you a different command to unstage the file:

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
        new file:   tmp

This is because Git version 2.23.0 introduced a new command: git restore. It’s basically an alternative to git reset which will be covered in the next lecture. From Git version 2.23.0 onwards, Git will recommend restore instead of reset for many undo operations.

A taste of remote repositories

We finish this lecture with a taste of remote repositories. You can download the repository we just made using git clone command:

> git clone https://github.com/scarpma/git_course.git
> cd git_course

That creates a directory named git_course, initializes a .git directory inside it, pulls down all the data for that repository(and its history), and checks out a working copy of the latest version. We'll se in the following lectures how to work with remote repositories (git remote add, git pull and git push).

[!NOTE]

Git has a number of different transfer protocols you can use. The previous example uses the https:// protocol, but you may also see git:// or user@server:path/to/repo.git, which uses the SSH transfer protocol. For private repositories, the ssh protocol is recommended.