Macros for a pythonist. Yandex Report

How can I extend the Python syntax and add the necessary features to it? Last summer at PyCon I tried to make out this topic. From the report you can find out how the pytest, macropy, patterns libraries are arranged and how they achieve such interesting results. At the end there is an example of code generation using macros in HyLang, a Lisp-like language running on top of Python.


- Hi guys. First of all, I want to thank the organizers of PyCon. I am a developer at Yandex. The report will not be about work at all, but about experimental things. Perhaps they will lead one of you to the idea that in Python you can do cool things that you did not even know about before, did not think in this direction.

A little for those who are not aware of what macros are: this is such a way of code generation when an expression in the language is expanded into more complex code. What are the goodies for you? For you, the macro record is concise, it expresses some abstraction, but it does a lot of work under the hood for you, and you do not need to write all this code by hand.

pytest


Most likely, you came across a pytest test framework, many here almost certainly use it. I don’t know if you have ever noticed, but under the hood he also does some magic.



For example, you have such a simple test. If you run it without pytest, then it will throw an AssertionError simply.



Unfortunately, my example is a little degenerate, and here it is immediately obvious that len ​​is taken from a list of three elements. But if some function were called, then you would never have known from such an AssertionError that the function returned. She returned just something that is not equal to a hundred.



However, if this is run under pytest, then it will display additional debugging information. How does he do it inside?



This magic works very simply. Pytest creates its own special hook that fires when the module with the test loads. After that, pytest independently parses this Python file, and as a result of parsing, its intermediate representation is obtained, which is called the AST-tree. The AST tree is a basic concept that allows you to change Python code on the fly.

After receiving such a tree, pytest imposes a transformation on it that looks for all expressions called assert. He changes them in a certain way, he compiles the resulting new AST tree, and he gets a module with tests, which then runs on a regular Python Virtual Machine.



This is what the original AST tree not converted to pytest looks like. The highlighted red area is our Assert. If you look closely, you will see its left and right parts, the list itself.

When pytest converts this and generates a new year, the tree begins to look like this.



There are about a hundred lines of code that pytest generated for you.



If you convert this AST tree back to Python, it will look something like this. The areas highlighted in red here are where pytest calculates the left and right parts of the expression, generates an error message, and raises an AssertionError if something went wrong with this error message.

Pattern matching


What else can you do with such a thing? You can convert any Python code. And there is one wonderful library that I found quite by accident on PyPI, it is interesting to dig there. She does pattern matching.



Perhaps this code is familiar to someone. He considers factorial recursively. Let's see how it can be recorded using pattern matching.



To do this, just hang the decorator on the function. Please note: inside the body, the function already works differently. Each of these ifs is a rule for pattern matching, which parses the expression that is input to the function and somehow transforms it. Moreover, there are not even explicit returns of the result. Because the patterns library, when it transforms the function body, firstly, it checks that it contains only if, and secondly, it adds implicit returns of the result, thus changing the semantics of the language. That is, she makes a new DSL, which works a little differently. And thanks to this, you can write down some things declaratively.


The previous function is as if written in three lines.





And the rest of the lines add additional functionality that allows, for example, reading factorial from a list of values ​​or passing it through an arbitrary function.

How to write conversions yourself? macropy!


Now you are probably wondering, but how can you apply it yourself? Because it's tedious to do, like pytest: manually parse files, look for the code that needs to be converted. In pytest, this is done by a separate module for a thousand or more lines.

In order not to do this on our own, some smart guys have already come up with a module for us called macropy.

This version of the module is for both the second Python and the third. They wrote it back in the time of the second Python. Then the guys had a joke to figure out what can be done with Python, and the library includes various examples. Let's look at them, they will give you an idea of ​​what you can do with this technique. The first cool thing they described in the tutorial is a macro that implements format strings for the second Python, as in the third.



The expression highlighted in red is just the syntax of the macro call. The letter S is the name of the macro, and then in square brackets is the expression that it converts. As a result, variables are substituted here. This works in the second Python, but the third is no longer needed in such a macro. Thus, for example, you can make your own macro, which implements more complex semantics and does more fun things than standard format strings.



When a macro expands, and this happens at the time of loading the module, it simply converts to that code. Placeholders are inserted into the format string and the substitution procedure is applied to it. Further Python already in a standard way compiles all this. In runtime, no macro expansions occur. All of them occur when the module is loaded. Therefore, on such a thing, you can even make optimizations or calculations that will occur at the time of loading the module and generate a more optimal bytecode.



The second example is also interesting. This is a shorthand notation for writing lambdas. The macro f takes a series of arguments and returns a function instead. Each expression starting with the macro name “f”, brackets, and then absolutely any expression is converted to a lambda.



In my opinion, this is also cool, especially for those who like to develop and write code in a functional style and use MapReduce.


Here is another familiar example. This function considers factorial, the code is highlighted in red. What will happen when she gets called?



It will throw an error in Python, because it will run into the stack limit and there will be such an ugly RecursionError.



How can this be fixed? Using macropy, fixing the problem is very simple.



You hang the decorator, it takes the body of the function and transforms it in some magical way. You do not need to change anything in the function itself, macropy will do everything for you.



And the function will return to itself quite a normal result, going far to the underground.


How macropy does it?



It replaces all calls to the function itself with a special TailCall object, which is then called in a loop by the TCO decorator.



The circuit looks something like this. The decorator in the loop calls the function until it returns some normal result instead of TailCall. And if she returned, then returns it. And that’s all. These cool things can be done with macros!

Macropy also includes other examples. I hope those who are curious of you go and see them on their own. Let's say there are things useful for debugging.



I'll tell you about another cool thing. One example is this query macro. What is he doing? Inside it, you write regular Python code, which you can then use as a regular result of executing this expression. But inside, macropy transforms this code and makes it into Alchemy SQL query language code.



He rewrites it for you, makes this terrible expression. It can be rewritten by hand, then it will be shorter. I did it.



Here is the original expression. After expanding the macro, it takes on something like this.



Perhaps someone is interested in writing code more similar to Python, and not forcing their developers to write queries on DSL SQL Alchemy.

In the same way, you can generate anything from Python - pure SQL, JavaScript - and save it somewhere next to the file, and then use it on the frontend.



Now let's see how to make your own macro. With macropy, it is very simple.

A macro is a function that takes an AST tree at the input and, somehow transforming it, returns a new one. Here is a macro example that adds a description to the assert call containing the source expression so that we can understand why the AssertionError error occurred.

Here, the internal replace_assert function is helper. She does a recursive descent in a tree for you. Inside the replace_assert, the subtree element is passed.



Due to this, you can inside check its type and? if it's an Assert call, do something with it. Here I will give a simple synthetic example that takes the left part, the right part, makes an error message from them, and writes everything to the msg attribute. This is the message that will need to be returned.







When using it, you attach such a macro to a block of code using the with context manager, and all the code that gets inside the context manager goes through this transformation. It is seen below that our error message was added to the AssertionError, which we formed from the len expression ([1, 2, 3]).



However, this method has one limitation that makes me personally sad. I tried as an experiment to make new designs that will work in the language. For example, some people like switch or conditional constructions like unless. But unfortunately, this is not possible: macropy and any other tools that work with the AST tree are used when the source code is already read and broken into tokens. The code is read by the Python parser, whose grammar is fixed in the interpreter. To change it, you need to recompile Python. Of course, you can do this, but it will already be a fork of Python, and not a library that can be laid out on PyPI. Therefore, it is impossible to make such constructions using macropy.

HyLang


Fortunately, for my long life I wrote not only in Python and was interested in various other alternative languages. There is a syntax that many dislike, but more simple and flexible. These are s-expressions.

Fortunately for us, there is a Python add-in called HyLang. This thing is somewhat reminiscent of Clojure, only Clojure runs on top of the JVM, and HyLang runs on top of the Python Virtual Machine. That is, it provides you with new syntax for writing code. But at the same time, all the code you write will be fully compatible with existing Python libraries, and it can be used from Python libraries.



It looks something like this.



The part on the left written in Python, on the right - on HyLang. And from the bottom for both of them is a bytecode, which is the result. You probably noticed that it is exactly the same, only the syntax changes. HyLang s-expressions, which many do not like. Opponents of the “brackets” do not understand that such a syntax gives the language tremendous power because it gives uniformity to the constructions of the language. And uniformity allows you to use macros to implement any design.

This is achieved due to the fact that inside each expression the first element is always some kind of action. And then his arguments go.

And all the code is made up of nested expressions that are easy to convert and open macros there. Due to this, absolutely any constructions can be made in HyLang, new, in no way indistinguishable in the code from the standard features of the language.



Let's see how a simple macro works on HyLang. To do the same thing that we did with Assert using macropy, you only need this code.

Our HyLang macro receives input, which is code. Further, a macro can easily use any part of this code to create new code. The main difference between macros and functions: expressions are input, not values. If we call our macro as (is (= 1 2)) then it will receive an expression (= 1 2) instead of False.



So we can generate an error message that something went wrong.



And then just return the new code. This backtick and tilde syntax means something like the following. The back quote says: take this expression as is and return it as is. And the tilde says: substitute the value of the variable here.



Therefore, when we write this, the macro upon expansion will return to us a new expression, which will thereby be assert with an additional error message.

HyLang is a cool thing. True, while we do not use it. Maybe we never will. All of these items are experimental. I want you to leave here with the feeling that in Python you can do some things that you might not even have thought of before. And maybe some of them will find practical application in your ongoing work.

That’s all for me. You can see the links:

  • Patterns ,
  • MacroPy ,
  • HyLang ,
  • The book OnLisp - for an advanced study of the capabilities of macros. This is for those especially interested. True, the book is not entirely based on Python, but on Common Lisp. But for a deeper study, this will be even interesting.

All Articles