What is MagicString and are these lines so magical?

MagicString is a little-known library. Despite this, it solves one of the most pressing problems - changing the source code using its structure (AST - abstract syntax tree).

In this article we will learn what MagicString is and whether these lines are really “magic”. This will help us understand the next article in which I will explain how we managed to translate Angular documentation so quickly, and how it will help with the creation of a universal translator for both Markdown and files of any other format.





2 weeks ago I released the Russian-language documentation of Angular ( angular24.ru ). During this time, 35 issues were added with corrections in the text and 2 pull request. I sincerely doubted that the system in which you select the text, offer a translation and automatically issue on GitHub will work. But crowdsourcing works! :) You can learn more about this from this article .

After the release, one of the most asked questions was: “Why?”. The question is absolutely correct, but in order to answer it, you must first understand what MagicString is, how it works and how it is useful.

Suppose we have a simple source code:

const a = 1;

We want to replace const with var . The simplest solution is to replace const with var with the usual String.prototype.replace . And for this task, this is most likely the most correct solution. But what if we need to replace const with var only in the global scope? But do not replace them inside functions? You can, of course, come up with a more complex regularity or write tricky code, but there is a more scalable and flexible way.

We can use the parsers to get AST - Abstract Syntax Tree. If you are interested in what AST is, then go to astexplorer.net . In essence, it is a tree that accurately displays the structure of your code.

Further, each of the Nodes in this AST has a startand end indexes indicating the positions of these elements in the source code. Knowing these coordinates and having at hand the structure of the document, we can make complex replacements and permutations with preserving the structure of the document.



Usually, replacement is done using the visitor pattern design and several helpers , which usually wrap themselves in a single library, which can be called the “transformer API". Each parser has its own "transformer API".

Such libraries are very easy to use, but they have several problems. One of them is performance.

Since each (well, almost every) Node in the AST tree contains coordinates, when changing 1 node we often need to update the coordinates for the rest of the tree. Here you can argue that you can do with a little blood - do not update the coordinates everywhere, but simply render the AST back to the text based on the structure. But there is 1 problem: you will immediately lose the formatting of the original text, which contradicts our task - to replace const with var in the existing line. In fact, we get a new line with a new formatting. And if this is not a problem for a small line, imagine a file of 1000 lines in which the formatting has completely changed due to replacing const with var . That doesn't sound very good.



And here comes the magic of MagicString. I first learned of their existence from the Rich Harris project, which was called butternut . Butternut is a JavaScript minifier. Butternutt was claimed to be 3 times faster than UglifyJS and 10-15 times faster than Babili . I’ll run ahead and say that the project was covered with a copper basin at least 3 years ago. But even then, I was intrigued by the secret of its performance. It was a MagicString.

Let's take a look at working with MagicString:

var MagicString = require( 'magic-string' );
var s = new MagicString( 'const a = 1; const b = 2;' );

s.overwrite( 0, 5, 'var' );
s.toString(); // 'var a = 1; const b = 2;'

//  

The algorithm of MagicString is very simple: we wrap the original string in an object in which we do not directly apply the changes to the string, but put the coordinates and what needs to be done into an array for the future. And only when someone wants to get the resulting row, we begin 1 to 1 to perform the accumulated operations. For example:

  1. We replaced const with var, starting at index 0 and ending at index 5
  2. We know that all subsequent replacements must have index less than 2 (var less than const by 2 characters, a line shorter)
  3. We update the coordinates of all operations
  4. We apply the following operation, etc.




Everything looks pretty simple. But why is MagicString faster? The answer is quite simple: the number of operations that we perform on our tree is much less than the number of AST nodes. Not to mention the amount of memory needed for the AST and the fact that Tree Traversal (traveling through a tree) is not a free operation, but O (n + m)



And if I am ready to wait an extra half hour? And here comes the second plus of MagicString. Each parser invents its own API for transformation. And this is still very good, if there is such an API (not every parser provides it), very often we are left without the ability to normally replace the source text using AST. But MagicString is a single universal API for changing the source string. It doesn't matter which parser or combination of parsers you used. With MagicString, you can work equally well with any AST.



I hope you are interested in MagicString. In the next article I will talk about the double MagicString and how to make a universal translator of Markdown documents.

Subscribe to my Telegram channel @obenjiro_notes and Twitter obenjiroso as not to miss the following articles on the topic and many other interesting things.

All Articles