⬆️ 🐪 🦄 Content-based tagging in the werf collector: why and how does it work? 🉑 🦊 💎

werf is our open source GitOps CLI utility for building and delivering applications to Kubernetes. In release v1.1 has a new feature to the collector of images: tagging of images by content, or content-based tagging . Until now, the typical tagging scheme in werf involved tagging Docker images using a Git tag, Git branch, or Git commit. But all of these schemes have flaws that are completely solved by the new tagging strategy. Details about her and why she is so good - under the cut.

Rollback a set of microservices from one Git repository

Often there is a situation where the application is divided into many more or less independent services. Releases of these services can occur independently: one or several services can be released at a time, while the rest should continue to work without any changes. But from the point of view of code storage and project management, it is more convenient to keep such application services in a single repository.

There are situations when services are truly independent and not connected with one application. In this case, they will be located in separate projects and their release will be through separate CI / CD processes in each of the projects.

However, in reality, developers often break a single application into several microservices, but having a separate repository and project for each ... is an obvious overkill. It is about this situation that will be discussed further: several such microservices lie in a single project repository and releases occur through a single process in CI / CD.

Git tag and Git tagging

Let's say the most common tagging strategy is used - tag-or-branch . For Git branches, images are tagged with the name of the branch, for one branch at a time there is only one published image named for this branch. For Git tags, images are tagged according to the tag name.

When creating a new Git tag - for example, when a new version is released - a new Docker tag will be created for all project images in the Docker Registry:

myregistry.org/myproject/frontend:v1.1.10
myregistry.org/myproject/myservice1:v1.1.10
myregistry.org/myproject/myservice2:v1.1.10
myregistry.org/myproject/myservice3:v1.1.10
myregistry.org/myproject/myservice4:v1.1.10
myregistry.org/myproject/myservice5:v1.1.10
myregistry.org/myproject/database:v1.1.10

These new image names get through the Helm patterns into the Kubernetes configuration. When the deployment starts, the team werf deployupdates the field imagein the Kubernetes resource manifests and restarts the corresponding resources due to the changed image name.

Problem : In a case where the previous actual vykata (Git-tag) has not changed the contents of the image, but only its Docker-tag that occurs once restart this Application and, accordingly, some simple possible. Although there was no real reason to make this restart.

As a result, with the current tagging scheme, you have to fence several separate Git repositories and the problem arises of organizing the rollout of these several repositories. In general, such a scheme is overloaded and complex. It is better to combine many services into a single repository and create such Docker tags so that there are no unnecessary restarts.

Git commit tagging

Werf also has a tagging strategy related to Git commits.

Git-commit is the identifier of the contents of the Git repository and depends on the history of file edits in the Git repository, so it seems logical to use it to tag images in the Docker Registry.

However, tagging by Git commit has the same drawbacks as by Git branches or Git tags:

, , Docker- .
merge-, , Docker- .
, Git, , Docker- .

Git-

There is another issue related to the tagging strategy for Git branches.

Tagging by the name of a branch works as long as the commits of this branch are collected sequentially in chronological order.

If in the current scheme the user starts rebuilding the old commit associated with some branch, then werf will erase the image using the corresponding Docker tag with the newly assembled version of the image for the old commit. Deployments using this tag from now on risk during pod restarts to pull another version of the image, as a result of which our application will lose connection with the CI system and will be out of sync.

In addition, with consecutive push'ahs in one branch with a small time interval between them, the old commit may be collected later than the newer one: the old version of the image will erase the new one using the tag of the Git branch. Such problems can be solved by the CI / CD system (for example, in GitLab CI, the pipeline of the latter is launched for a series of commits). However, this is not supported by all systems and there should be a more reliable way to prevent such a fundamental problem.

What is content-based tagging?

So, what exactly is content-based tagging - tagging images by content.

To create Docker tags, not Git primitives (Git branch, Git tag ...) are used, but a checksum associated with:

. - . , ;
Git. , Git- werf, -.

The so-called signature of the stages of the image acts as such an identifier tag .

Each image consists of a set of steps: from, before-install, git-archive, install, imports-after-install, before-setup, ... git-latest-patchetc. Each stage has an identifier, which reflects its content - the signature stage (stage signature) .

The final image, consisting of these stages, is tagged with the so-called signature of the set of these stages - stages signature - which is generalizing for all stages of the image.

Each image from the configuration werf.yamlwill generally have its own such signature and, accordingly, the Docker tag.

The stage signature solves all these problems:

Resistant to empty git commits.
Resistant to git commits that change files that are not relevant to the image.
Does not lead to a problem with grinding the current version of the image when restarting assemblies for old Git commits of the branch.

This is now the recommended tagging strategy and is used by default in werf for all CI systems.

How to enable and use in werf

The corresponding option appeared for the team werf publish: --tag-by-stages-signature=true|false

In the CI-system, the tagging strategy is set by the command werf ci-env. Previously, a parameter was defined for it werf ci-env --tagging-strategy=tag-or-branch. Now, if you specify werf ci-env --tagging-strategy=stages-signaturethis option or not, werf will use a tagging strategy by default stages-signature. The command werf ci-envwill automatically set the necessary flags for the command werf build-and-publish(or werf publish), therefore, no additional options for these commands need to be specified.

For example, the command:

werf publish --stages-storage :local --images-repo registry.hello.com/web/core/system --tag-by-stages-signature

... can create the following images:

registry.hello.com/web/core/system/backend:4ef339f84ca22247f01fb335bb19f46c4434014d8daa3d5d6f0e386d
registry.hello.com/web/core/system/frontend:f44206457e0a4c8a54655543f749799d10a9fe945896dab1c16996c6

Here 4ef339f84ca22247f01fb335bb19f46c4434014d8daa3d5d6f0e386dis the signature of the stages of the image backend, and f44206457e0a4c8a54655543f749799d10a9fe945896dab1c16996c6is the signature of the stages of the image frontend.

When using special functions werf_container_imageand werf_container_envin Helm templates, nothing needs to be changed: these functions will automatically generate the correct image names.

Example configuration in a CI system:

type multiwerf && source <(multiwerf use 1.1 beta)
type werf && source <(werf ci-env gitlab)
werf build-and-publish|deploy

More configuration information is available in the documentation:

Total

New option werf publish --tag-by-stages-signature=true|false.
The new value of the option werf ci-env --tagging-strategy=stages-signature|tag-or-branch(if not specified, it will be by default stages-signature).
If the tagging options for Git commits were used before ( WERF_TAG_GIT_COMMITor the option werf publish --tag-git-commit COMMIT), then be sure to switch to the stages-signature tagging strategy .
New projects are better to immediately switch to a new tagging scheme.
When translating to werf 1.1, it is advisable to switch old projects to the new tagging scheme, however the old tag-or-branch is still supported.

Content-based tagging solves all the problems highlighted in the article:

Docker tag name stability to empty git commits.
The stability of the name of the Docker tag to Git commits that change files that are not relevant to the image.
Does not lead to a problem with grinding the current version of the image when restarting assemblies for old Git commits for Git branches.

Use it! And do not forget to drop by our GitHub to create an issue or find an existing one, put a plus, create a PR, or just watch the development of the project.

Content-based tagging in the werf collector: why and how does it work?