Werf 1.1 release: improvements in the collector today and plans for the future



werf is our open source GitOps CLI utility for building and delivering applications to Kubernetes. As promised, the release of v1.0 marked the beginning of adding new features to werf and revising familiar approaches. Now we are pleased to release v1.1, which is a big step in the development and the future in the collector werf. The version is currently available in channel 1.1 ea .

The basis of the release is the new architecture of the stage storage and the optimization of the work of both collectors (for Stapel and Dockerfile). The new storage architecture opens up the possibility of implementing distributed assemblies from multiple hosts and parallel assemblies on a single host.

Optimization of work includes getting rid of unnecessary calculations at the stage of calculating stage signatures and changing the mechanisms for calculating file checksums to more efficient ones. This optimization reduces the average build time of a project using werf. And idle builds, when all stages exist in the stages-storage cache , are now really fast. In most cases, restarting the assembly will be faster than in 1 second! This also applies to the procedures for verifying stages during the work of teams werf deployand werf run.

Also in this release there was a strategy for tagging images by content - content-based tagging , which is now enabled by default and is the only recommended one.

Let's take a closer look at the key innovations in werf v1.1, and at the same time talk about plans for the future.

What has changed in werf v1.1?


New stage naming format and cache stage selection algorithm


New stage name generation rule. Now each stage assembly generates a unique stage name, which consists of 2 parts: a signature (as in v1.0) plus a unique temporary identifier.

For example, the full name of the stage image may look like this:

werf-stages-storage/myproject:d2c5ad3d2c9fcd9e57b50edd9cb26c32d156165eb355318cebc3412b-1582656767835

... or in general form:

werf-stages-storage/PROJECT:SIGNATURE-TIMESTAMP_MILLISEC

Here:

  • SIGNATURE - this is the stage signature, which represents the identifier of the stage content and depends on the history of edits in Git that led to this content;
  • TIMESTAMP_MILLISEC - This is guaranteed a unique identifier for the image that is generated at the time of the assembly of the new image.

The algorithm for selecting stages from the cache is based on checking the relationship of Git commits:

  1. Werf calculates the signature of some stage.
  2. In stages-storage , there can be several stages for a given signature. Werf selects all stages that are suitable for the signature.
  3. If the current stage is associated with Git (git-archive, user step c Git-patches: install, beforeSetup, setup, or git-latest-patch), then werf selects only those steps which relate to commit, which is the ancestor of the current commit (for which caused assembly )
  4. Of the remaining suitable stages, one is selected - the oldest by date of creation.

A stage for different Git branches can have the same signature. But werf will prevent the use of a cache associated with different branches between these branches, even if the signatures match.

→ Documentation .

New algorithm for creating and saving stages in stage storage


If werf does not find a suitable stage during the selection of stages from the cache, the process of assembling a new stage is initiated.

Note that several processes (on one or more hosts) can begin assembling the same stage at approximately the same time. Werf uses the stages-storage optimistic locking algorithm when a freshly-collected image is stored in stages-storage . Thus, when the assembly of the new stage is ready, werf blocks the stages-storage and saves the freshly collected image there only if there is no suitable image there (for the signature and other parameters - see the new algorithm for selecting stages from the cache) .

A freshly picked image is guaranteed to have a unique identifier by TIMESTAMP_MILLISEC (see the new stage naming format) . If a suitable image is found in stages-storage , werf will discard the freshly collected image and will use the image from the cache.

In other words: the first process that finishes collecting the image (the fastest) will receive the right to save it in stages-storage (and then this particular image will be used for all assemblies). A slow assembly process will never block a faster process from saving the assembly results of the current stage and proceeding to the next assembly.

→ Documentation .

Improved Dockerfile collector performance


At the moment, the stage pipeline for an image compiled from the Dockerfile consists of one stage - dockerfile. When calculating the signature, the checksum of the files is considered context, which will be used during assembly. Prior to this improvement, werf recursively traversed all files and received a checksum, summing up the context and mode of each file. Starting with v1.1, werf can use the calculated checksums stored in the Git repository.

The algorithm is based on git ls-tree . The algorithm takes into account entries in .dockerignoreand passes recursively through the file tree only if necessary. Thus, we got rid of reading the file system, and the dependence of the algorithm on the size is contextnot significant.

The algorithm also checks untracked files and, if necessary, takes them into account in the checksum.

Improved performance when importing files


Werf v1.1 uses the rsync server when importing files from artifacts and images . Previously, importing was performed in two steps using mounting the directory from the host system.

Import performance on macOS is no longer limited to Docker volumes, and imports are done in the same time as Linux and Windows.

Content-based tagging


Werf v1.1 supports the so-called tagging by image content - content-based tagging . Tags for resulting Docker images depend on the content of these images.

When you run the werf publish --tags-by-stages-signatureor werf ci-env --tagging-strategy=stages-signaturecommand, published images of the so-called image stage signature will be tested . Each image is tagged with its own signature of the stages of this image, which is calculated according to the same rules as the regular signature of each stage separately, but is a generalized identifier of the image.

The signature of the stages of the image depends on:

  1. the contents of this image;
  2. Git revision history that led to this content.

Git repositories always have idle commits that do not modify the contents of the image files. For example, commits with comments only, or merge commits, or commits that change those files in Git that will not be imported into the image.

Using content-based tagging solves the problem of unnecessary restarts of application pods in Kubernetes due to image name changes, even if the image content has not changed. By the way, this is one of the reasons that makes it difficult to store many microservices of one application in a single Git repository.

Also, content-based tagging is a more reliable tagging method than tagging by Git branches, because the content of the resulting images does not depend on the order of execution of pipelines in the CI system to assemble several commits of the same branch.

Important: From now on, stages-signature is the only recommended tagging strategy . It will be used by default in the team werf ci-env(unless you explicitly specify a different tagging scheme).

→ Documentation . A separate publication will also be dedicated to this feature. UPDATED (April 3): An article with details has been published .

Logging levels


The user has the opportunity to control the output, set the level of logging and work with debugging information. Added options --log-quiet, --log-verbose, --log-debug.

By default, the output contains a minimum of information:



When using the detailed output ( --log-verbose), you can trace how werf works:



Detailed output ( --log-debug), in addition to the debug information of werf, also contains the logs of the libraries used. For example, you can see how the interaction with the Docker Registry happens, and also fix the places where a significant amount of time is spent:



Future plans


Attention! The features described below marked v1.1 will be available in this version, many of them in the near future. Updates will come through auto updates when using multiwerf . These features do not affect the stable part of v1.1 functions; their appearance will not require manual user intervention in existing configurations.

Full support for various Docker Registry implementations (NEW)


  • Version: v1.1
  • Dates: March
  • Issue

The goal is that the user should use an arbitrary implementation without restrictions when using werf.

To date, we have identified the following set of solutions for which we are going to guarantee full support:

  • Default (library / registry) *,
  • AWS ECR,
  • Azure *,
  • Docker hub
  • GCR *,
  • Github packages
  • GitLab Registry *,
  • Harbor *,
  • Quay.

An asterisk indicates solutions that are currently fully supported by werf. For the rest there is support, but with limitations.

Two main problems can be distinguished:

  • Some solutions do not support the removal of tags using the Docker Registry API, which does not allow users to use the automatic cleaning implemented in werf. This is true for AWS ECR, Docker Hub and GitHub Packages.
  • Some solutions do not support the so-called nested repositories (Docker Hub, GitHub Packages and Quay) or support, but the user must create them manually using the UI or API (AWS ECR).

We are going to solve these and other problems using native APIs of solutions. This task also includes covering the full werf cycle with tests for each of them.

Distributed image assembly (↑)



At the moment, werf v1.0 and v1.1 can only be used on one dedicated host for the assembly and publication of images and deploy applications in Kubernetes.

In order to open up the possibilities of werf distributed work, when the assembly and deployment of applications in Kubernetes is started on several arbitrary hosts and these hosts do not maintain their state between assemblies (temporary runners), werf is required to implement the possibility of using the Docker Registry as a stage repository.

Previously, when the werf project was still called dapp, it had such an opportunity. However, we encountered a number of problems that must be considered when implementing this function in werf.

Note. This feature does not imply the work of the collector inside Kubernetes pods, as to do this, get rid of the dependence on the local Docker server (in the Kubernetes pod there is no access to the local Docker server, because the process itself is running in the container, and werf does not support and will not support working with the Docker server on the network). Support for work in Kubernetes will be implemented separately.

Official GitHub Actions Support (NEW)


  • Version: v1.1
  • Dates: March
  • Issue

Includes werf documentation ( reference and guide sections ), as well as the official GitHub Action for working with werf.

In addition, it will allow werf to work on ephemeral runners.

The mechanics of user interaction with the CI system will be based on putting labels on pull requests to initiate certain actions to build / roll out the application.

Local application development and deployment with werf (↓)


  • Version: v1.1
  • Dates: January-February April
  • Issue

The main goal is to achieve a single unified config for deploying applications both locally and in production, without complex actions, out of the box.

Werf also needs a mode of operation in which it will be convenient to edit the application code and instantly receive feedback from a working application for debugging.

New cleaning algorithm (NEW)


  • Version: v1.1
  • Dates: April
  • Issue

In the current version of werf v1.1, the procedure cleanupdoes not provide for cleaning images for the content-based tagging scheme - these images will accumulate.

Also, in the current version of werf (v1.0 and v1.1), different cleaning policies are used for images published by tagging schemes: Git branch, Git tag, or Git commit.

A new unified image cleaning algorithm for all tagging schemes was invented based on the history of commits in Git:

  • Store no more than N1 images associated with N2 last commits for each of git HEAD (branches and tags).
  • Store no more than N1 images-stages associated with N2 last commits for each of git HEAD (branches and tags).
  • , - Kubernetes ( kube- namespace'; ).
  • , , Helm-.
  • , HEAD git (, HEAD ) Kubernetes Helm.

(↓)


  • : v1.1
  • : - *

The current version of werf collects the images and artifacts described in werf.yaml, sequentially. It is necessary to parallelize the process of assembling independent stages of images and artifacts, as well as provide a convenient and informative conclusion.

* Note: the deadline is shifted due to the increased priority for the implementation of a distributed assembly, which will add more features for horizontal scaling, as well as the use of werf with GitHub Actions. Parallel assembly is the next optimization step, giving vertical scalability when assembling a single project.

Switch to Helm 3 (↓)


  • Version: v1.2
  • Dates: February-March May *

It includes the transition to the new Helm 3 code base and a proven, convenient way to migrate existing installations.

* Note: the transition to Helm 3 will not add significant features to werf, because all the key features of Helm 3 (3-way-merge and lack of tiller) are already implemented in werf. Moreover, werf has additional features besides those indicated. However, this transition remains in our plans and will be implemented.

Jsonnet for Kubernetes configuration description (↓)


  • Version: v1.2
  • Dates: January-February April-May

Werf will support the configuration description for Kubernetes in Jsonnet format. At the same time, werf will remain compatible with Helm and it will be possible to select a description format.

The reason is the fact that the Go language patterns, according to many people, have a large entry threshold, and the code intelligibility of these patterns also suffers.

Other options for implementing Kubernetes configuration description systems (such as Kustomize) are also being considered.

Work inside Kubernetes (↓)


  • Version: v1.2
  • Dates: April-May May-June

Purpose: To ensure the assembly of images and application delivery using runners in Kubernetes. Those. assembly of new images, their publication, cleaning and deployment can occur directly from Kubernetes pods.

To realize this feature, you first need the ability to distributedly build images (see paragraph above) .

It also requires support for the build mode of operation without the Docker server (i.e., a Kaniko-like build or a build in userspace).

Werf will support assembly in Kubernetes not only with the Dockerfile, but also with its Stapel builder with incremental rebuilds and Ansible.

Step towards open source development


We love our community ( GitHub , Telegram ) and we want more and more people to help make werf better, understand what direction we are moving in, and participate in the development.

Most recently, it was decided to switch to GitHub project boards in order to open the workflow of our team. Now you can see the immediate plans, as well as ongoing work in the following areas:


A lot of work has been done with issues:

  • Removed irrelevant.
  • Existing are reduced to a single format, a sufficient number of details and details.
  • Added new issues with ideas and suggestions.

How to enable version v1.1


The version is currently available in channel 1.1 ea ( releases in stable and rock-solid channels will appear as they stabilize, but ea itself is already stable enough for use, as it passed through alpha and beta channels ). It is activated through multiwerf in the following way:

source $(multiwerf use 1.1 ea)
werf COMMAND ...

Conclusion


The new stage store architecture and optimized collector performance for Stapel and Dockerfile collectors open up the possibility of implementing distributed and parallel assemblies in werf. These features will soon appear in the same release v1.1 and will become automatically available through the auto-update mechanism (for multiwerf users ).

In this release, a content-based tagging strategy was added , which became the default strategy. A log was also redesigned the basic commands: werf build, werf publish, werf deploy, werf dismiss, werf cleanup.

The next significant step will be to add distributed assemblies. Distributed assemblies since v1.0 have become a higher priority than parallel assemblies, because they add more werf value: vertical scaling of collectors and support for ephemeral collectors in various CI / CD systems, as well as the ability to make official support for GitHub Actions. Therefore, the timing of the implementation of parallel assemblies has been shifted. However, we are working to quickly realize both possibilities.

Follow the news! And do not forget to drop by our GitHub to create an issue, find an existing one and put a plus, create a PR or just watch the development of the project.

PS


Read also in our blog:


All Articles