Rust. Borrow checker through iterators

Hello, Habr!

I have been studying for about a year and, in my free time, I write on the rast. I like how its authors solved the problem of memory management and dispensed with the garbage collector - through the concept of borrowing. In this article I will approach this idea through iterators.

Lately, scala is my main language, so there will be comparisons with it, but there are not many of them and everything is intuitive, without magic :) The

article is designed for those who heard something about rust, but did not go into details.


photos taken from here and from here

Foreword


In jvm languages, it’s customary to hide work with links, that is, there we almost always work with reference data types, so we decided to hide the ampersand (&).

The rasta has explicit links, for example to integer - `& i32`, the link can be dereferenced via` * `, there can also be a link to the link and then it will need to be dereferenced twice **.

Iterator


When writing code, very often you need to filter the collection by a condition (predicate). In the rock to take even elements would look something like this:

    val vec = Vector(1,2,3,4)
    val result = vec.filter(e => e % 2 == 0)

Let's look at the sorts:

  private[scala] def filterImpl(p: A => Boolean, isFlipped: Boolean): Repr = {
    val b = newBuilder
    for (x <- this)
      if (p(x) != isFlipped) b += x

    b.result
  }

Without going into the details of `newBuilder`, it is clear that a new collection is being created, we iterate over the old one and if the predicate returns true, then add an element. Despite the fact that the collection is new, its elements are actually links to elements from the first collection, and if, suddenly, these elements are mutable, then changing them will be common to both collections.

Now let's try to do the same in the rast. I will immediately give a working example, and then I will consider the differences.

    let v: Vec<i32> = vec![1, 2, 3, 4];
    let result: Vec<&i32> = v.iter().filter(|e| **e % 2 == 0).collect();

Wow, wow what? Double pointer dereferencing? Just to filter the vector? Hard :( But there are reasons for this.

Let us single out how this code differs from the rock:

  1. explicitly get the iterator on the vector (`iter ()`)
  2. in the predicate function, for some reason, we dereference the pointer twice
  3. call `collect ()`
  4. it also resulted in a vector of reference types Vec <& i32>, and not ordinary ints

Borrow checker


Why explicitly call `iter ()` on the collection? It is clear to any rockman that if you call `.filter (...)` then you need to iterate over the collection. Why in a rast explicitly write what can be done implicitly? Because there are three different iterators!



To figure out why three? need to touch on Borrow (borrow, borrow) checker 'a. The very thing due to which the rast works without a GC and without explicit memory allocation / deallocation.

Why is it needed?

  1. To avoid situations when several pointers point to the same memory area, allowing you to change it. That is a race condition.
  2. In order not to deallocate the same memory several times.

How is this achieved?

Due to the concept of ownership.

In general, the concept of ownership is simple - only one can own something (even intuition).

The owner may change, but he is always alone. When we write `let x: i32 = 25`, this means that there was a memory allocation for 32bit int and a certain` x` owns it. The idea of ​​ownership exists only in the compiler’s mind, in borrow checker. When the owner, in this case, `x` leaves the scope (goes out of scope), the memory of which he owns will be cleared.

Here is a code that borrow checker will not miss:


struct X; // 

fn test_borrow_checker () -> X {
    let first = X; //  
    let second = first; //  
    let third = first; //   ,   first   
//    value used here after move

    return third;
}

`struct X` is something like` case class X () `- a borderless structure.

This behavior is super counterintuitive, I think, for everyone. I don’t know other languages ​​in which it would be impossible to “use” the same “variable” twice. It is important to feel this moment. first is not at all a reference to X, it is its owner . Changing the owner, we kind of kill the previous one, borrow checker will not allow its use.

Why did you need to create your own structure, why not use a regular integer?
— (`struct X`), , , integer. , , :


fn test_borrow_checker () -> i32 {
    let first = 32;
    let second = first; 
    let third = first; 

    return third;
}

, borrow checker, , . Copy, . `i32` second , ( ), - third . X Copy, .

. , , «» . Clone, , . copy clone.

Back to the iterators. The concept of "capture" among them is IntoIter . He “swallows” the collection, giving possession of its elements. In the code, this idea will be reflected like this:


let coll_1 = vec![1,2,3];
let coll_2: Vec<i32> = coll_1.into_iter().collect();
//coll_1 doesn't exists anymore

By calling `into_iter ()` at coll_1 we "turned" it into an iterator, absorbed all its elements, as in the previous example, `second` absorbed` first`. After that, any calls to coll_1 will be punished by borrow checker during compilation. Then we collected these elements with the `collect` function, creating a new vector. The `collect` function is needed to collect a collection from an iterator, for this you have to explicitly specify the type of what we want to collect. Therefore, coll_2 clearly indicates the type.

Okay, in general, the above described is enough for a programming language, but it will not be very efficient to copy / clone data structures every time we want to transfer them, and you also need to be able to change something. So we go to the pointers.

Pointers


The owner, as we have found out, can be only one. But you can have any number of links.


#[derive(Debug)]
struct Y; // 

fn test_borrow_checker() -> Y {
    let first = Y; //  
    let second: &Y = &first; //   ,     
    let third = &first; //    

// 
    println!("{:?}", second);
    println!("{:?}", third);

    return first;
}


This code is already valid, because the owner is still one. All ownership logic is checked only at the compilation stage, without affecting memory allocation / moving. Moreover, you can see that the type of second has changed to `& Y`! That is, the semantics of ownership and links are reflected in types, which allows you to check during compilation, for example, the absence of a race condition.

How can I protect against race condition at compile time?

By setting a limit on the number of mutable links!

A mutable link at one moment in time can be one and only one (without immutable). That is, either one / several immutable, or one mutable. The code looks like this:


// 
struct X {
    x: i32,
} 

fn test_borrow_checker() -> X {
    let mut first = X { x: 20 }; //  
    let second: &mut X = &mut first; //   
    let third: &mut X = &mut first; //    .        `second`        - .
//    second.x = 33;  //    ,             ,    
    third.x = 33;

    return first;
}

Let's go over the changes in the relative previous example. First, we added one field to the structure so that there was something to change, because we need mutability. Secondly, `mut` appeared in the declaration of the variable` let mut first = ... `, this is a marker to the compiler about mutability, like` val` & `var` in the rock. Thirdly, all links have changed their type from `& X` to` & mut X` (it looks, of course, monstrous. And this is without life time ...), now we can change the value stored by the link.

But I said that we cannot create several mutable links, they say borrow checker will not give this, but I created two myself! Yes, but the checks there are very tricky, which is why it is sometimes not obvious why the compiler swears. He is trying hard to make sure that your program compiles and if there are absolutely no options to meet the rules, then a mistake, and maybe not the one you are waiting for, but the one that violates his last attempt, the most desperate and not obvious for a beginner: ) For example, you are informed that the structure does not implement the Copy trait, although you did not call any copies anywhere.

In this case, the existence of two mutable links is allowed at the same time because we use only one, that is, the second can be thrown away and nothing will change. Also `second` can be used up tocreate a `third` and then everything will be okay. But, if you uncomment `second.x = 33;`, it turns out that two mutable links exist simultaneously and you can’t get out of here anyway - compile time error.

Iterators


So, we have three types of transmission:

  1. Absorption, borrowing, moving
  2. Link
  3. Mutable link

Each type needs its own iterator.

  1. IntoIter absorbs objects from the original collection
  2. Iter runs on object links
  3. IterMut runs on mutable object references

The question arises - when to use which. There is no silver bullet - you need practice, reading someone else's code, articles. I will give an example demonstrating the idea.

Suppose there is a school, there is a class in it, and students in the class.


#[derive(PartialEq, Eq)]
enum Sex {
    Male,
    Female
}

struct Scholar {
    name: String,
    age: i32,
    sex: Sex
}

let scholars: Vec<Scholar> = ...;

We took the vector of schoolchildren by querying the database, for example. Next, I needed to count the number of girls in the class. If we “swallow” the vector through `into_iter ()`, then after counting we can no longer use this collection to count the boys:


fn bad_idea() {
    let scholars: Vec<Scholar> = Vec::new();
    let girls_c = scholars
        .into_iter()
        .filter(|s| (*s).sex == Sex::Female)
        .count();

    let boys_c = scholars 
        .into_iter()
        .filter(|s| (*s).sex == Sex::Male)
        .count();
}

There will be an error “value used here after move” on the line for counting boys. It is also obvious that the mutable iterator is of no use to us. That's why it is just `iter ()` and working with a double link:


fn good_idea() {
    let scholars: Vec<Scholar> = Vec::new();
    let girls_c = scholars.iter().filter(|s| (**s).sex == Sex::Female).count();
    let boys_c = scholars.iter().filter(|s| (**s).sex == Sex::Male).count();
}

Here, to increase the number of potential recruits in the country, a mutable iterator is already required:


fn very_good_idea() {
    let mut scholars: Vec<Scholar> = Vec::new();
    scholars.iter_mut().for_each(|s| (*s).sex = Sex::Male);
}

Developing the idea, we can make soldiers out of the “guys” and demonstrate the “absorbing” iterator:


impl Scholar {
    fn to_soldier(self) -> Soldier {
        Soldier { forgotten_name: self.name, number: some_random_number_generator() }
    }
}

struct Soldier {
    forgotten_name: String,
    number: i32
}

fn good_bright_future() {
    let mut scholars: Vec<Scholar> = Vec::new();
    scholars.iter_mut().for_each(|s| (*s).sex = Sex::Male);
    let soldiers: Vec<Soldier> = scholars.into_iter().map(|s| s.to_soldier()).collect();
    //   scholars,    
}

On this wonderful note, perhaps that's all.

The last question remains - where did the double dereferencing of the links in `filter` come from. The fact is that a predicate is a function that takes a reference to an argument (so as not to capture it):


    fn filter<P>(self, predicate: P) -> Filter<Self, P> where
        Self: Sized, P: FnMut(&Self::Item) -> bool,

the predicate is FnMut (roughly speaking a function), which takes a reference to its (self) item and returns bool. Since we already had a link from the iterator `.iter ()`, the second appeared in the filter. When absorbed by an iterator (`into_iter`), double dereferencing of the link turned into a regular one.

Continuation


I do not have much experience in writing articles, so I will be glad to criticize.
If interested, I can continue. Options for topics:

  • how and when memory deallocation occurs
  • link lifetime
  • asynchronous programming
  • writing a small web service, you can even offer api

Links


  • rust book
  • Due to the concept of ownership, the implementation of such basic things as, for example, a linked list is no longer trivial. Here are several ways how to implement them.

All Articles