Methods for dealing with legacy code using GitLab as an example

You can endlessly cheat on whether GitLab is a good product. It’s better to look at the numbers: according to the results of the investment round, the GitLab valuation was $ 2.7 billion, while the previous valuation was $ 1.1 billion. This means rapid growth and the company will hire more and more front-end developers.

This is the story of the appearance of the frontend in GitLab.



This is a graph of the number of front-end vendors in GitLab, beginning in 2016, when they were not there at all, and ending in 2019, when there were already a few dozen of them. But GitLab itself has been around for 7 years. So, until 2017, the main code on the frontend was written by backend developers, worse, the backend developers on Ruby on Rails (in no case do we want to offend anyone and explain below what we are talking about).

For 7 years, any project, whether you like it or not, is becoming obsolete. At some point, refactoring becomes impossible to put off anymore. And the journey begins, full of adventure, the final point of which never reach. About how this happens in GitLab, said Ilya Klimov.


About the speaker: Ilya Klimov (xanf) senior frontend engineer at GitLab. Prior to that, he worked in startups and outsourcing, led a small outsourcing company. Then I realized that I had not had time to work out the product yet, and came to GitLab.

The article is based on a report by Ilya at FrontendConf , therefore it does not so much structure the information as reflects the speaker’s experience. It may seem overly conversational, but no less interesting from the point of view of working with legacy.

In GitLab, as in many other projects, they are gradually migrating from old technologies to something more relevant:

  • CoffeeScript in JavaScript. Developers on Ruby on Rails, when they started the project, could not help but pay attention to CoffeeScript, which looks very similar to Ruby.
  • JQuery Vue. , JQuery. GitLab SPA. server-side rendering progressive enhancement, Vue-. , : Vue-, .
  • Karma Jest. Jest - . Karma , , .
  • REST GraphQL , , Vuex Apollo. Vuex, Redux Vue, Apollo local state , . GraphQL , .

At the same time, replacements take place in several directions at once, in the project there is simultaneously a legacy-code at different stages.

Now imagine that you are coming to a project that is in the middle of all these migrations. The standard and reference point for me was the situation when you open editing a project, press the save button - and what do you think is coming? If you thought that we are such old-phages that HTML comes, then no. JavaScript comes in that you need to “evolve” to display a pop-up window. This is the bottom point in my legacy picture.

Further up: self-written classes in jQuery, Vue-components and, as the highest point, new modern features written with Vuex, Apollo, Jest, etc.

This is what my contribution-graph on GitLab looks like.



In it - and this is very important for understanding the essence of the story and all my pains - several segments can be distinguished:

  • Onboarding in the April area. “Honeymoon” when I just started working at GitLab. At this time, beginners are given easier tasks.
  • From the end of April until mid-May there are only a few commits - a period of denial : “No, it cannot be that everything is done that way!” I tried to understand where I don’t understand something, so there are so few commits.
  • The second half of May is anger : "I don't give a damn about everything - I need to move production, share features, try to do something about it."
  • The beginning of June (zero commits) is a depression . This was not a vacation, I watched and understood that my hands were falling, I do not know what to do with it.
  • After that, I agreed with myself , decided that I was hired as a professional, and I can make GitLab better. In June and July, I offered a huge number of improvements. Not all of them resonated for reasons that we will talk about.
  • Now I am at the stage of adoption and clearly understand: how, where, why and what to do with all this.

I’ll tell you in more detail what I did from August to October. Honestly, in a small outsourcing company or in a startup, I would have been fired five times with such productivity in these three months.

So, in three months I did:

  • Segmented control - three buttons.
  • The search string that stores the local history is a slightly more complex component.
  • Spinner. And this component is not frozen yet.



Next, step by step, we will analyze why this happened and how to live with it. If it seems to you that I'm exaggerating, here is a screenshot of some of the tasks that hang on me on GitLab (you can look directly at GitLab, it is open).



See: missed 12.1, missed 12.2, missed 12.3. Sprint lasts a month, and segmented control - 3 sprints. Spinner is still gone, he will be our main character.

The problem of refactoring and the philosophy of refactoring has been facing humanity for a very long time - for millennia. Now I will prove:
« ; , , ; ; , .

, , , : ».

The Bible tells us how to combine old and new functionality. The second part of the quotation is valuable from a management point of view: no matter how you go out with initiatives, you will meet with great resistance.

In the depression phase, I watched a lot of reports on refactoring large projects, but about 70% of them reminded me of a joke.

Javist talk:
- How do we speed up our Java application?
- Oh, so I had a report about it! Do you want to tell?
- To tell and I can, I would speed up!

If you still decide to embark on the dangerous and shaky path of refactoring, I have some simple recipes that I have developed for myself and which work in conditions that are close to reality.

1. Insulation


To speed things up, improve, refactor, you need to cut the elephant into steaks, that is, divide the task into parts. GitLab is very large, we have a Slack channel “Is this known”, where people ask questions like “Is this a bug or a feature, who can explain?” - and the answer is not always found.

A simple example: screenshots of the same place in GitLab, taken with a difference of one day.



I was very upset, because I was working on this button, and this is all one way or another problems with the button.

What happened? It's simple: we develop a design system, and as part of a separate story book tool for testing a design system, we disabled the global CSS GitLab to check how CSS components are isolated from global CSS.

Summarizing: CSS is no longer saveat least in GitLab.

I’ve been working with JavaScript for 14 years and have never seen a project that at least a year or two in length preserve fully managed CSS. By the way, HTML can not be saved either (in GitLab for sure).

GitLab has been developed for a long time and backendov. They made a controversial decision to use Bootstrap because Bootstrap offered a backend-friendly layout system.

But what is Bootstrap in terms of component isolation philosophy? This is about 600-700 global classes (in fact, each CSS class is global) that permeate the entire application. In terms of manageability, nothing good will come of it.

The next action (let's not call it a mistake) - GitLab took Vue.js. The choice was reasonable, because of the three frameworks, it is Vue that allows you to most smoothly rewrite something. You don’t need to immediately throw out and cut a large Single Page Application, but you can rewrite individual small nodes. Now it can be done on Angular, but 3-4 years ago, when Angular 2 appeared, it could not coexist on a page in more than one instance. React is also possible now, but all this magic with the lack of a build step and so on tipped the scales towards Vue.

As a result, one evil was combined with the second. This is bad, because Bootstrap styles do not know anything about the component system, and Vue components were written at first, anyhow. Therefore, a strong-willed decision was made to create their own design system. We have it calledPajamas , but no one could explain to me why.

I see that now there are more and more of our own design systems, and this is nice.

The design system involves isolation, but since GitLab was already written in Bootstrap, approximately 50-60% of our design system is a wrapper over Bootstrap / Vue components with a decrease in their functionality. This is necessary so that the design system does not allow you to use the component incorrectly. If we talk about an abstract library, then flexibility is important there, for example, the ability to make any button you want. If in GitLab spinners can be four sizes approved by designers, then you need to physically not let others do it.

Someday, good will win, and we will have an important tool with which, of course, if you scored on support for IE and Edge, you can effectively refactor front-end projects - this is Shadow DOM . Shadow DOM solves the problem of global styles flowing into components. Please do not take Polymer, which even Google has already buried. Use lit-element and lit-HTML, and you can build isolated styles using your favorite framework.

You can say that React has CSS modules, Vue has scoped styles that do the same. Be very careful with them: CSS modules do not provide 100% isolation because they only work with classes. And with scoped styles in Vue, a very interesting scenario can be realized when the styles of the top component fall into the root element of the parent, and data attributes are used there that slow down.

Despite the fact that I scolded Angular for three years, now I have to admit that at the moment it is implemented in it best. In Angular, in order to ensure good style isolation, it is enough to simply switch the isolation mode and, if necessary, use the Shadow DOM, otherwise normal emulation.

Back to the spinner. Of the three months that I fought with him, for some time I was engaged in an exciting business: cleaning.



A class loading-containeris an implementation detail of a spinner, that is, it is a class inside a spinner implementation. We decided, since CSS is not to be saved, in Pajamas to create separate CSS based on Atomic CSS. I personally don't really like the Atomic CSS concept, but we have what we have.

That is, I was engaged in cleaning out styles in the code of the main product that were hung on elements that are implementation details. It all looks very simple - because, of course, there are tests in GitLab.

2. Tests


Tests in GitLab cover all code , provide reliability. And so the pipeline is completed in 98 minutes.



GitLab collects 40% of the time of public runners on GitLab.com because GitLab collects pipelines for every merge request.

I was very inspired: I finally got on a project where everything is covered in tests! The backend code coverage is close to 100%, and the front-end code at the time of my arrival was covered by 89.3%.

Unfortunately, it turned out that most of this coverage is trash because:

  • breaks down when changes are made that are not related to the components;
  • Doesn't break when changes are made.

I will explain with examples. We took Jest because we thought that he would allow us in certain situations not to write assertions, but to use snapshots. The problem is that if you did not configure Jest and add the correct serializer, then Vue Test Utils simply outputs props in HTML. Then, for example, it turns out props with the name user, which had props in the parameters with the name data, to which the object object was passed. Any changes in the format of the transmitted data do not lead to a failure of the snapshot.

Ruby developers are used to doing tests, roughly speaking, covering all methods.
When we do tests for Vue or React components, we need to test how the public API behaves.
Thus, we had huge tests for computed properties, which were not used in some scenarios, but in others it was physically impossible to reach the state when this computed would be called. Special thanks to Vue, in which the templates are strings, so you cannot calculate the test coverage of the template. In Vue 3, Source Maps will appear and the ability to fix it, but it will not be soon.

Fortunately, there is one simple skill that will allow you to effectively refactor legacy. This is the ability to write what is called the pinning test in the world of big testing.

Pinning test


This is a test that tries to capture the behavior that you are refactoring. Please note that the pinning test will most likely not end up being committed to the repository. That is, you, through all sorts of refinements, for example, using the staging environment, write for yourself a test that describes how your component is rendered. After refactoring, the pinning test should either generate the same HTML, and this is most likely a good sign, or you should understand what changes have occurred.

I will give an example from life. A few months ago, I conducted a merge request review with refactoring a drop down list. The legacy context is this: earlier, in order to separate the branches of a friend from each other by a dash in the drop-down list, the text string “divider” was simply passed. Therefore, if your branch was called divider, then you are out of luck. In the process of refactoring, a person swapped two classes in an HTML node, this went into production and ruined it. In fairness, of course, not quite production, but in staging, but nonetheless.

As a result, when we started writing such tests, we found that, despite the cool test coverage indicator, the tests were written incorrectly. Because, firstly, we had tests for Karma, that is, old ones. Secondly, almost all tests made assumptions about the internals of the component. That is, they pretended to be unit tests, and worked essentially like end-to-end, checking that a specific tag with a specific class was being rendered, instead of checking that a specific component was being rendered with specific props. Understand the difference: classes are components?

As a result, my 18 merge requests with refactoring tests for a total of 8-9 thousand lines, the total changelog turned out to be about 20 thousand, because 11 thousand were cut.



At the same time, formally, I reworked all these tests for the sake of one thing: to remove assertions regarding spinner classes, and instead to check that the spinner with the correct props is rendered there.

At first glance, this is a thankless job. But rewriting tests for the right architecture was pretty easy to sell to the business. GitLab is a commercially profitable product. Of course, if you tell the product manager that you need three iterations to rewrite 20 tests, guess where you will be sent. Another thing: “I need three iterations to rewrite the test. This will allow us to introduce spinners more efficiently and accelerate the future implementation of new elements of the design system. ” And here we come to the important.

3. Resistance


There is another functionality that more of my spinners are waiting for in the GitLab design system - these are ordinary SVG icons.


We have icons drawn by the designer that are used in the main project, but they are not in the design system, because GitLab has a difficult childhood. For example, in 2019 CSS is collected not through Webpack, but by a piece called Sprockets - this is pipeline Ruby, because we need to reuse the same CSS on the backend and frontend. Because of this, icons must be connected to different projects in different ways. Therefore, someone refactored the main code base for three months so that you could connect the icons from the design system to related projects.

There is an important point here that you will inevitably encounter. Refactoring is a process of continuous improvement. But sooner or later you have to stop.
It’s absolutely normal to stop, not completing refactoring, but getting concrete measurable improvements.
But if you are working on a legacy project, you will inevitably come across people who do this.

This means that they write in the old way, because they are so used to it. For example, our backenders say: “I don’t want to teach this your Jest. I’ve written tests for Karma for three years, I need to add new functionality, and since they won’t take functionality without tests, here’s a test for Karma.

Your task is to resist this as much as possible. This is relatively easy to fight, but there is an even greater sin than that. Sometimes in the process of refactoring you come across a problem, and there is a desire to generally go aside.

That is, to substitute a new crutch simply because for certain reasons it is not possible to bring refactoring to the end. For example, if we have problems integrating icons into the main code base, we can leave a utility class that will be pulled from the global Application CSS. Formally, the business task will be solved, but in practice, as in the history of the Lernean hydra: there were 8 bugs, 4 fixed, 13 remained.

Refactoring, like repairing a house - it is impossible to finish, you can only stop it.
The first 80% of the refactoring takes 20% of the time, the remaining 80% of the refactoring (just like that) takes another 80% of the time.
It’s important not to introduce new hacks during the refactoring process. Believe me, during the development process, they themselves will appear.

4. Tools


Fortunately, even before I arrived, GitLab embarked on the righteous path of introducing good tools: Prettier, Vue Test Utils, Jest. Although Prettier implemented crookedly.

I will explain what is at stake. While I figured out what and why so historically, 80% of my searches came across a commit of 37 thousand lines of prettify code. It was almost impossible to use the history, and I had to configure the plug-in for VS Code so that it would exclude this commit when searching for the history of changes.

Sure, tools are important, but you need to choose them carefully. For example, we have Vue, and Vue has a good testing tool - Vue Test Utils. But if Vue 2 was released 2-3 years ago, then Vue Test Utils still haven't gotten out of beta. Moreover, according to insider information, at the moment the only developer of Vue Test Utils does not write on Vue.

In the process of choosing tools, you play toss with fate, and really try to win.

GitLab had a childhood injury with CoffeeScript. That is why it is impossible to push even the theoretical idea of ​​writing in TypeScript in GitLab. Everything breaks down into one simple argument: will it not be the same as with CoffeeScript when the language that compiles into JavaScript has died.
When choosing tools, try so that the tool can be replaced, or, in extreme cases, maintained independently.
We at GitLab use a cool thing called Danger.

This is a real screenshot of their website in 2019. But, colleagues said that in fact in 2019 the site may look like anything.

Danger is a bot that occupies an intermediate state between the linter in your CI and the written guidelines. This bot can be expanded and it will come to pull request or, as they are correctly called with us, merge request and leave comments like:
  • “There is an ESlint disable comment in this file, fix it.”
  • “This file used to be ruled by this person. Perhaps you need to put a review on it. ”

In my opinion, this is a very good, important and extensible framework for monitoring the state of the code base.

5. Abstraction


I'll start with an example. A few months ago, I saw the news: “GitHub got rid of jQuery. We have come a long hard way and are no longer using jQuery. ” Naturally, I thought that we also need to get rid of jQuery in GitLab.



A quick search showed that jQuery is used in 300 files. It looks scary, but nothing - the eyes are afraid, the hands are doing. But no! jQuery is an integral glue in the GitLab code base, because we have Bootstrap.

Bootstrap was originally written in jQuery. This means that if you need, for example, to catch the dropdown opening event in Bootstrap, this is a jQuery event. You cannot intercept it natively.

This is the first thing you should abstract when working with legacy code. If you have jQuery that you cannot throw out, write your own Event Emitter, which will hide inside work with jQuery events.

When a bright future comes, we can remove jQuery, but for now, sorry, you need to concentrate the govnokod. In a regular legacy project, it is evenly spread throughout the code. Collect it in bottlenecks marked with the flags “Do not enter without a chemical protection suit”.

6. Metrics


You cannot do something whose result cannot be measured. At GitLab, we measure everything we do to objectively know that the code is working better.



For example, we have a migration schedule from Karma tests (blue) to Jest (green):

You see that there is gradual progress. And we have a lot of such schedules. But it is important to understand that not always everything ends well.

I will give one more example (the demo in the report starts from this moment).



Here is the usual merge request interface in GitLab production. Obviously, we can collapse files, click on the header and the file will begin to collapse.

What do you think, how long does it take to collapse a file of 50 lines, while the machine with the eighth generation Core i7 is twisted for maximum performance? How long does the deployment take?

The time it takes for the file to collapse ranges from 7 to 15 seconds. Deployment occurs instantly. Before refactoring, both worked equally fast.

That is why it is very important to have metrics.

I’ll tell you what is happening here. This is Vue, its reactivity system keeps track of the value: if it changes, all dependencies that depend on this value are called. Each line is a Vue component consisting of many things, because you can put comments on a line, comments can be dynamically loaded from the server, etc. All this is subscribed to the Vue-store, which is also a Vue component.

When you close the merge request, all, say, 20 thousand store subscriptions need to be updated when the store is updated. If the row is deleted, it must be removed from the dependencies. And then simple math: you need to look at an array of 20 thousand elements to find those that need to be removed. Let's say there are 500 such lines, and each line is several components. The result is a mathematical operation O (N), that is, O (20,000) * 500. JavaScript has been running all this time.

Deployment happens instantly, because adding a dependency is just a push to the array, i.e. mathematical operation O (1).

Sometimes improving the quality of the code degrades performance and other metrics. It is very important to measure them.

In summary: isolate bad code and keep track of metrics.

legacy — . , – TechLead Conf — , . – , legacy Python, PHP.

, ++, , FrontendConf. .

All Articles