Git Guide Part number 2: the golden rule and other basics of rebase

Let's see what happens when you run git rebase and why you need to be careful. 

This is the second and third part of the Git guide from the Pierre de Wulf blog translated by Mail.ru Cloud Solutions team . The first part can be read here .

The essence of rebase


How exactly does rebase happen:


You can say that rebase is to unpin the branch you want to move and connect it to another branch. This definition is true, but try to look a little deeper. If you look at the documentation , hereโ€™s what it says about rebase: โ€œApply commits to another branch (Reapply commits on top of another base tip)โ€.

The main word here is to apply, because rebase is not just copy-paste branches to another branch. Rebase sequentially takes all the commits from the selected branch and reapplies them to the new branch.

This behavior leads to two points:

  1. By re-applying commits, Git creates new commits. Even if they contain the same changes, then Git is considered as new and independent commits.
  2. Git rebase overrides commits and does not delete old ones. This means that after rebase is complete, your old commits will continue to be stored in the /gjects subfolder of the .git folder. If you donโ€™t fully understand how Git stores and considers commits, read the first part of this article.

Here is a more correct interpretation of what happens during rebase:


As you can see, the feature branch contains completely new commits. As mentioned earlier, the same set of changes, but completely new objects from the point of view of Git. 

It also means that old commits are not destroyed. They just become inaccessible directly. If you remember, a branch is just a link to a commit. Thus, if neither a branch nor a tag refers to a commit, it cannot be accessed using Git, although it continues to be present on the disk.

Now let's discuss the Golden Rule.

Golden rebase rule


The golden rebase rule is: โ€œ NEVER rebase a shared branch! ". A shared branch refers to a branch that exists in the network repository and which other people except you can work with.

Often this rule is applied without proper understanding, therefore, we will analyze why it appeared, especially since it will help to better understand the work of Git.

Let's look at a situation where a developer breaks the golden rule, and what happens in this case.

Suppose Bob and Anna are working together on a project. Below is what the Bob and Anna repositories and the original repository on GitHub look like:


All users have repositories synchronized with GitHub.

Now Bob, breaking the golden rule, performs a rebase, and at the same time, Anna, working in the feature branch, creates a new commit:


Do you see what will happen?

Bob is trying to execute a push commit; he gets a refusal of something like this:


Git was not successful because Git did not know how to combine the feature branch of Bob with the feature branch of GitHub.

The only solution that allows Bob to push is to use the force key, which tells the GitHub repository to remove the feature branch and accept the one that Bob is pushing for this branch. After that we get the following situation:


Now Anna wants to launch her changes, and here is what will happen:


This is normal, Git told Anna that she does not have a synchronized version of the feature branch, that is, her version of the branch and the version of the branch in GitHub are different. Anna must complete the pull. In the same way that Git merges a local branch with a branch in the repository when you push, Git tries to merge the branch in the repository with a local branch when you pull.

Before doing pull commits in the local and GitHub branches look like this:

A--B--C--D'   origin/feature // GitHub
A--B--D--E    feature        // Anna

When you pull, Git merges to eliminate the difference in the repositories. And so, what does this lead to:


Commit M is a merge commit. Finally, the feature branches of Anna and GitHub are fully merged. Anna breathed a sigh of relief, all conflicts were resolved, she can perform a push. 

Bob is pulling, now everything is in sync:


Looking at the resulting mess, you had to make sure the importance of the golden rule. Also keep in mind that such a mess was created by just one developer and on a branch that is shared between just two people. Imagine being in a team of ten people. 

One of the many benefits of Git is that you can roll back without any problems any time back. But the more mistakes are made, such as described, the more difficult it is to do it.

Also note that duplicate commits appear in the network repository. In our case, D and D ', containing the same data. In fact, the number of duplicate commits can be as large as the number of commits in your rebased branch.

If you're still not convinced, let's introduce Emma, โ€‹โ€‹the third developer. She works in the feature branch before Bob makes his mistake and currently wants to push. Suppose that by the time she push our little previous script has already completed. Here's what comes out:


Oh, that Bob !!!!

This text might make you think that rebase is only used to move one branch to the top of another branch. This is optional - you can rebase on the same branch.

Beauty pull rebase


As you saw above, Anna's problems could have been avoided if she had used pull rebase. Let's consider this question in more detail.

Let's say Bob works in a branch that departs from the master, then his story may look like this:




Bob decides that it is time to pull, which, as you already understood, will lead to some confusion. Since Bob's repository was moving away from GitHub, Git will ask if the merge is done, and the result will be like this:


This solution works and works fine, however, it may be useful for you to know that there are other solutions to the problem. One of them is pull-rebase.

When you do pull-rebase, Git tries to figure out which commits are only in your branch and which are in the network repository. Git then combines the commits from the network repository with the latest commit present in both the local and network repositories. Then rebase your local commits to the end of the branch. 

It sounds complicated, so we illustrate:

  1. Git pays attention only to the commits that are in both your and the network repository:

    It looks like a local clone of the GitHub repository.
  2. Git rebase local commits:


As you recall, when rebase Git applies commits one by one, that is, in this case it applies master commit E, then F. to the end of the branch. The result is rebase to itself. It looks good, but the question arises - why do this?

In my opinion, the biggest problem with merging branches is that the history of commits is polluted. Therefore, pull-rebase is a more elegant solution. I would even go further and say that when you need to download the latest changes to your branch, you should always use pull-rebase. But you need to remember: since rebase applies all commits in turn, when you rebase 20 commits, you may have to resolve 20 conflicts one after another. 

Typically, you can use the following approach: one big change made a long time ago is merge, two small changes made recently a pull-rebase.

Strength rebase onto


Suppose your commit history looks like this:




So you want to rebase the feature 2 branch to the master branch. If you do a regular rebase on the master branch, get this:


It is illogical that commit D exists in both branches: in feature 1 and feature 2. If you move the feature 1 branch to the end of the master branch, it turns out that commit D will be applied twice.

Suppose you need to get a different result:


To implement such a scenario, git rebase onto is exactly what is intended.

First, read the documentation:

SYNOPSIS
       git rebase [-i | --interactive] [<options>] [--exec <cmd>]
               [--onto <newbase> | --keep-base] [<upstream> [<branch>]]
       git rebase [-i | --interactive] [<options>] [--exec <cmd>] 
[--onto <newbase>]
               --root [<branch>]
       git rebase (--continue | --skip | --abort | --quit | --edit-todo 
| --show-current-patch)


We are interested in this:

OPTIONS
       --onto <newbase>
          Starting point at which to create the new commits. If the 
--onto option is not specified, the starting point is <upstream>. May be 
any valid commit, and not just an existing branch name.


Use this option to indicate at which point to create new commits.

If this option is not specified, then upstream will become the starting point.

For understanding, I will give one more picture:

A--B--C        master
    \
     D--E      feature1
         \
          F--G feature2

Here we want to rebase feature2 to master beginning from feature1
                           |                                |
                        newbase                         upstream


That is, the master branch is newbase, and the feature 1 branch is upstream.

Thus, if you want to get the result as in the last figure, you must run git rebase --onto master feature1 in the feature2 branch.

Good luck

Translated with support from Mail.ru Cloud Solutions .

What else to read on the topic :

  1. The first part of the Git guide.
  2. My second year as an independent developer .
  3. Our telegram channel on digital transformation


All Articles