Useless REPL. Yandex Report

REPL (read-eval-print loop) is useless in Python, even if it's magic IPython. Today I will offer one of the possible solutions to this problem. First of all, the report and my extension TheREPL will be useful to those who are interested in faster and more efficient development, as well as those who write stateful systems.


- My name is Alexander, I work as a programmer in Yandex. We are writing in my team in Python, we have not yet switched to Go. But in my free time, I, oddly enough, also program and do it in a very dynamic language - Common Lisp. It is perhaps even more dynamic than Python. Its peculiarity lies in the fact that the development process itself is arranged somewhat differently. It is more interactive and iterative, because in REPL on Lisp you can do everything: create new and delete old modules, add methods, classes and delete them, redefine classes, etc.



In Python, this is all the more difficult. It has IPython. Of course, IPython improves REPL in some way, adds autocompletion, and allows using different extensions. But for iterative development, it does not fit very well. In it you can download the code, test it a little and that's it. And sometimes he wants more interactivity so that you can really use this REPL in development, switch between modules, change functions and classes inside them.

It happens to me - you run, for example, IPython REPL in the production environment and you start to run some commands there, investigate something, and then it turns out that there is an error in the module, and you want to fix it quickly. But this doesn’t work, because you need to collect a new Docker image, roll it into production, go into this REPL again, achieve the desired state there again, start up everything that fell on it again. And ideally, I would have to fix the function, immediately run it and instantly get the result.

What can be done about this? How can I reload code in IPython? I tried using autoreload and I didn’t like it for several reasons. First of all, when the module is rebooted, it loses the state that was in the global variables inside this module. And there may be a cached value with the results of some functions. Or I could, for example, load data over the network there, so that later I could work with them faster. That is, autoreload loses state.

Therefore, as an experiment, I made my simple extension for IPython and named it TheREPL.

I came to you with this report as an idea of ​​what can be done with REPL in Python. And I really hope that you will like this idea, you will carry it out in your head and will continue to come up with things that will make Python even more efficient and convenient.

What is TheREPL? This is the extension that you download, after which a concept such as namespace appears in IPython, and you can take and switch to any Python module, see what variables, functions, and so on are there. And more importantly, you can directly write def, the name of the function, redefine the function or class, and it will change in all modules where it was imported. But at the same time, the module itself does not restart, so the state is saved. In addition, TheREPL allows you to avoid some more artifacts that are in autoreload and which we will now look at.



So, in autoreload, code upgrade only happens when the file is saved. But at the same time, you need to enter something into the REPL itself, and only then autoreload will pick up these changes. This is problem number 1. That is, if you have some kind of background process in a separate thread (for example, the server is running), you cannot just take and correct the code. Autoreload will not apply these changes until you enter something into the IPython REPL.

In the case of my extension, you press the shortcut right in the editor, and the function that is under the cursor is immediately applied and starts working. That is, using TheREPL, you can change the code more granularly. You can also write def in IPython.



Switching between modules, as I said, autoreload does not support in any way. You can only find the file in the file system, change it and hope that autoreload will resolve everything there.



Farther. Autoreload loses global variables, TheREPL saves and allows you to continue to research the operation of your application, change its internal code and thus develop it quickly.



Autoreload still has this feature. He very cunningly applies changes to the module that reloads. In particular, he does a very interesting trick there. If the function in this module has been updated, then to change it wherever it has been imported, it uses the garbage collector to find it and all these instances of functions and change the code inside them. Further we will look at examples of how this happens. Due to this, the function code changes, even if it gets into the closure.

Do you know what a closure is? This is a very useful thing. JavaScript developers use this all the time. You, most likely, also simply never paid attention. But since autoreload does what I described above, you may find yourself in a situation where the old code uses new code that may work differently. For example, a function can return not one value, but two, tuple instead of string, etc. The old code will break on this.

TheREPL doesn’t do such a tricky thing specifically to ensure that everything is more consistent. That is, it changes the function or class in the module in which it is defined. Finds this class in all other modules, and changes it there too. After that, everything works in a new way.



How does replacing the function that autoreload does? We have two functions, one and two. Each function has a set of attributes: documentation, code, arguments, etc. Here on the slide is an example of replacing the attributes in which the bytecode is stored.

After autoreload changes it, the called function starts working differently. But this is a synthetic example that I just reproduced with my hands so that you understand what is happening. The function is called in one way, but the code there is actually different. And if you disassemble, it also shows that it returns a deuce. What does this lead to?



Here is an example of a closure. On the second line, we create a closure in which we capture the function foo. The closure itself expects that this function we passed returns a line, it encodes it in utf-8 and everything works.



But suppose you change the module in which foo is defined, and autoreload picks up the change. And you change it so that it returns not a string, but a number. Then the closure will already work incorrectly, because the function in it has changed inside, but the closure does not expect this, it has not changed. And such problems with autoreload can "shoot" in unexpected places.



How does autoreload update classes? Very simple. It updates all methods of the class in the same way as functions, and also it updates the __class__ attribute for all instances so that the resolution of the methods (determining which method should be called) starts working in a new way.

Everything is a bit more complicated in TheREPL, because when you update _class_, it may turn out that it has some descendants, child classes, which also need to be updated, because something has changed in the list of base classes.

To solve this problem, you can rebuild the class. But let's first see what happens with autoreload when it reloads a module.



Here is a good example. There are two modules - a and b. In module a, a parent class is defined, in module b a child class, and we create an instance of the child class. And line 10 shows that yes, this is an instance of the Foo class, the parent.



Next, we just take and change the module a. For example, add documentation to the Foo class. Then autoreload picks up these changes. What do you think that in this case he will return from Bar?



And it returns false, because autoreload has changed the Foo class, and now it is a completely different class, not the one from which the Bar class is inherited.



And a surprise! In the two modules a and b, the Foo class is a different class, and Bar inherits from one of them. Because of such jambs, it’s very difficult to predict how your code will work after autoreload fixes something in it.



Something like this, it updates classes. I will comment on the picture. Initially, the Foo class is imported into module b, and so it remains there. When replacing autoreload, this module a relocates, and a new class appears there, and in module b it is not updated.



TheREPL does a little different. He injects a modified class into each module where he was imported. Therefore, everything works correctly there. Moreover, if there were objects in the class, they will be preserved.



And this is how TheREPL solves the problem with child classes. That is, when the parent class has changed, it defines the list of base classes through the magic attribute mro (method resolution order). This attribute contains a list of classes in the order in which you want to look for methods or attributes in them. And each time you call the get_name method on your object, for example, Python will first check it in the Bar class, then in the Foo class, then in the object class, if it doesn’t find it. It acts according to the method resolution order procedure.

TheREPL uses this chip. It takes a list of base classes, changes there the class that you just changed to a new one. Creates a new child type, this is the second step. With the type function, you can actually create classes. If you have never used it - try it, it's fun.

You just say the name of the class, say what its base class is. In the simplest case, for example, object. And - a dictionary with class methods and attributes. Everything, you have a new class that you can instantiate, as usual. TheREPL takes advantage of this chip. It generates a child class and changes pointers to it in all objects of the old Bar class.

I still have a demo, let's take a look at how it works. First, let's look at such a simple thing.

First demo

I said that you can change the code inside the module. Suppose we have a server. I'll run it now. At some point, we find that for some reason he creates temporary directories. Or he began to create, but before that he did not create. Then we can connect to this server and, guessing that it probably creates these directories using the mkdtemp function from the file module, you can go directly to this Python module.

See - in the corner the name of the current module has changed. Now it says tempfile. And I can see what features there are. We see them, and we can, importantly, redefine them. I have prepared a special wrapper that allows you to decorate any function so that with all its calls you can see the trace from where it is called. Now we will import and apply them.

That is, I wrap the standard Python function, not even having access to the source code for this module. I can take and wrap it. And at the next output, we will see Traceback and find where it is called from.

In the same way, these changes can be rolled back so that it does not spam us. That is, we see that this server inside worker on the eighth line calls mkdtemp and continues to produce temporary directories for us, cluttering up the file system. This is one application.

Let's look at another example of why autoreload sometimes doesn't work very well at all. I have a telegram bot prepared:

Second demo

Now we activate autoreload and see how it helps us. That's it, now you can start the bot and talk to him. So that you can see better, we will begin a dialogue with him. Get to know the bot. So. There is some kind of mistake. A completely different mistake was conceived, and I decided to make changes at the last moment. But it doesn’t matter. Now we will fix it, autoreload will help us with this.

We are switching to the bot. And now I will temporarily comment on this, if so. I save the file. autoreload, in theory, had to catch these changes. Start the bot again. The bot recognized me. Let's talk to him.

Another mistake. She’s already conceived. Let's go fix it. I will leave the bot, it will work in the background, I will switch to the editor, and in the editor we will find this error. It's just a typo, and I forgot that my variable is called user_name. I saved the file. autoreload was supposed to catch her, and now we will see it.

But autoreload, as I already mentioned, knows nothing about the fact that the file has changed until you enter something into it. With such a long process ... It needs to be interrupted, restarted. Done. Go back to our bot, write to him. Well, you see, the bot forgot that my name is Sasha. Why? autoreload recreated it again because it reloads the whole module completely. And I need to write to the bot again, to restore its state.

And if you are debugging some kind of error that occurs in a certain state, then the state cannot be lost, because otherwise you will spend a lot of time again to achieve this state. TheREPL helps out in just such cases.

Let's see how the bot will be updated in case of using TheREPL. For the purity of the experiment, I will restart IPython and we will repeat it all over again.

And now I download TheREPL. He immediately starts listening on a specific port so that you can send a code inside it. By the way, this can be done even if IPython is running somewhere on the server and the editor is running locally, which can also help you out in some cases.

We import the bot, start it, write again. It’s clear here - we restarted Python, so it doesn’t remember who I am. Check that there is an error inside. Yes, there is a mistake. Well, let's get it done.

I switch back to the editor, correct the error. We don’t even have to save the file, I press Ctrl-C, Ctrl-C, this is a shortcut by which Emacs takes the current description of the function that is right under the cursor and sends it to the Python process that it is connected to. That's all, now we can go through and check how our bot responds to my messages there. Now, he remembers that I am Sasha, and honestly replies that he does not know how.

Let's try to add directly new functionality there. To do this, go back to the editor. For example, add the help command. For now, let him answer that he knows nothing about help. Again, press Ctrl-C, Ctrl-C, the code is applied. We go to the bot. See if he understands this command. Yes, the team has applied.

By the way, he still has such a thing, now we’ll look at it how the class will change. He has a state command, a special debugging command to view the state of the bot. So, some Oleg connected. Interesting.

When the bot executes this command, it calls reply to view the representation of the bot. We can go and correct, for example, this reply with something else. For example, make it so that just the names are entered. You can do so. We go back to our messenger, again execute state. And that’s all. Now reply works in a new way, but the object is the same, it has preserved its state, since it remembers all of us - Oleg, Sasha, kek and “DROP TABLE Users, Alex”!

Thus, you can write and debug code directly on the fly, without switching to this cycle, when you need to collect a package, roll it somewhere. You can quickly test something, change everything you need, and only then all these changes should be packaged properly and deployed.

Naturally, you should not do this in real production, because with this approach what kind of problem can be. You may forget that the code you just started on the server needs to be saved and then deployed as it should. This approach requires discipline. But in the process of developing and debugging on some kind of testing, this is just a great thing.

Be sure to make a plugin for PyCharm. If there is a volunteer who will help me with Kotlin and the PyCharm plugin, I will be glad to talk. Write me in the mail or telegram .

* * *

Connectto the development of TheREPL. There are many more chips you can think of. For example, you can come up with a way to update class instances when they upgrade, add new attributes there or upgrade their state somehow. Similarly, we will upgrade the database. Now this is not.

You can come up with a hot-reload code for production so that when new changes come to you, you do not have to restart the server. You can come up with a lot more. This is just an idea, and I want you to get it out of here. We must adjust everything for ourselves and make it convenient. That's all for me.

All Articles