Git Guide Part number 1: everything you need to know about the .git directory



Starting to use Git is like visiting a new country whose language you do not know. While it is clear where you are and where to go, everything is fine, but if you get lost, big problems begin.

There are tons of tutorials on Git commands posted on the Internet, but in this article, Git’s work is examined more deeply than just learning the commands.

This is the first part of the Git guide from the Pierre de Wulf blog translated by the Mail.ru Cloud Solutions team

New users find it difficult to get comfortable with Git. This is a powerful tool, but, unfortunately, not very easy to learn. A lot of new concepts, commands that perform different actions, if the file is passed as a parameter or not, unclear feedback ...

Probably the only way to overcome all these difficulties is to learn a little more than just git commit / push, to understand how Git works.

.Git folder


When you create a new repository with the git init command, Git creates a magic folder, .git. It contains everything you need for Git to work. If you want to remove Git from your project, but leave the project files on disk, simply delete the .git folder. Although who may need this?

    β”œβ”€β”€ HEAD
    β”œβ”€β”€ branches
    β”œβ”€β”€ config
    β”œβ”€β”€ description
    β”œβ”€β”€ hooks
    β”‚ β”œβ”€β”€ pre-commit.sample
    β”‚ β”œβ”€β”€ pre-push.sample
    β”‚ └── ...
    β”œβ”€β”€ info
    β”‚ └── exclude
    β”œβ”€β”€ objects
    β”‚ β”œβ”€β”€ info
    β”‚ └── pack
    └── refs
     β”œβ”€β”€ heads
     └── tags


Here is the contents of a typical .git folder before your first commit:

  1. HEAD - we will consider this later.
  2. config β€” , , , url , , email . git config, .
  3. description β€” gitweb .
  4. hooks β€” , Git. , , / commit/rebase/pull… . push .
  5. info - exclude - files that you do not want to include in the repository are described here. The functionality of this file is the same as that of the .gitignore file, except that it is not transferred to the repository. In practice, usually .gitignore is enough for all tasks.

What's inside the commit?


Each time you create a file and commit changes, Git archives the file and stores it in its data structure. An archived object is created with a unique name and stored in the objects folder.

Before examining the object folder, let’s clarify what commit is. A commit is a nugget of the current state of files in a working folder, but not only that.

In fact, when you commit changes, Git does just two things:

  1. If the file in the working folder has not changed, it simply adds the name of the compressed file (hash) to the snapshot.
  2. If the file in the working folder has changed, it compresses it, places it in the objects folder and adds the name of the compressed file (hash) to the snapshot.

Of course, here everything is described in a somewhat simplified manner, however, this is enough to understand the ongoing processes.

As soon as a snapshot is taken, it is also archived and named with a hash, then placed in the objects folder.

β”œβ”€β”€ 4c
β”‚ └── f44f1e3fe4fb7f8aa42138c324f63f5ac85828 // hash
β”œβ”€β”€ 86
β”‚ └── 550c31847e518e1927f95991c949fc14efc711 // hash
β”œβ”€β”€ e6
β”‚ └── 9de29bb2d1d6434b8b29ae775ad8c2e48c5391 // hash
β”œβ”€β”€ info // let's ignore that
└── pack // let's ignore that too


This is what the objects folder looks like after I created file_1.txt and committed it. Please note that if the hash of your file starts with β€œ4cf44f1e ...”, then Git will save it with the name β€œf44f1e ...” in a subfolder named β€œ4c”. Thus, the files will be laid out in 256 subfolders and each will not have too many files.

As you can see, we have three hashes. One for file_1.txt, the second for a snapshot taken at commit. What is the third for? The third hash is created because the commit is also an object, it is also archived and placed in the objects folder.

You need to remember that commit consists of four things:

  1. The name (hash) of the snapshot of the working directory.
  2. Comment.
  3. Information about who executed the commit.
  4. The hash of the parent commit.

See for yourself what happens if you unzip the commit file:

git cat-file -p 4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828

And here is what you will see:

tree 86550c31847e518e1927f95991c949fc14efc711
author Pierre De Wulf 
<test[@gmail.com](mailto:pierredewulf31@gmail.com)> 1455775173 -0500
committer Pierre De Wulf 
<[test@gmail.com](mailto:pierredewulf31@gmail.com)> 1455775173 -0500
commit A

You see, as expected, the hash of the snapshot, author, and comment of the commit. Two things are important here:

  1. The hash of the snapshot β€œ86550 ...” is also an object and can be seen in the objects folder.
  2. Since this is the first commit, it does not have a parent commit.

What is really in the picture?

git cat-file -p 86550c31847e518e1927f95991c949fc14efc711
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 file_1.txt

Here we see the object that was in our storage of objects, the only object in our picture.

Branch, tags, HEAD are one and the same


Now you understand that everything in Git can be obtained through the correct hash. Let's look at HEAD now. So what is there?

cat HEAD
ref: refs/heads/master

There is no hash, and that makes sense, since HEAD is a pointer to the top of the branch you are working with. If you look at the file refs / heads / master, you will see:

cat refs/heads/master
4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828
 

Looks familiar? Naturally, it's a hash of the first commit! This shows that tags and branches are just pointers to a commit. Understanding this, you can remove all the tags you want, all the branches you want, and the commit they pointed to will remain in place. The only thing, it will be more difficult to access it. If you want to know more about this, check out the git book .

Last comment


After reading it, it should become obvious to you that all that Git does is archive your working folder and put it in the objects folder with some additional information. If you are familiar enough with Git, then you have full control over which files will be included in the commit and which will not.

I believe that committing is not a snapshot of the working folder, but a snapshot of the files you want to commit. And where does Git store the list of files you want to commit? It saves this list in an index file. We will not delve deeply into this issue, if you are interested, more details can be found here .

Translated with support from Mail.ru Cloud Solutions .

What else to read :

  1. Simple Caching Methods in GitLab CI: A Picture Guide .
  2. -Agile .
  3. .


All Articles