Extending procedural macros with WASM

As part of continuing my research on various aspects of procedural macros, I want to share an approach to expanding their capabilities. Let me remind you that procedural macros allow you to add a metaprogramming element to the language and thereby significantly simplify routine operations, such as serialization or query processing. At their core, macros are compiler plugins that compile before building the rack in which they are used. Such macros have some significant drawbacks.


  • Difficulty supporting such macros in the IDE. In fact, you need to somehow teach the code analyzer to compile, load and execute these macros on their own, taking into account all the features. This is a very non-trivial task.
  • Since macros are self-sufficient and know nothing about each other, there is no way to compose macros, which could sometimes be useful.

As for solving the first problem, experiments are being conducted with compilation of all procedural macros into WASM modules, which will make it possible in the future to completely refuse to compile them on the target machine, and at the same time to solve the problem with their support in the IDE.


As for the second problem, in this article Iā€™m just going to talk about my approach to solving this problem. In fact, we need a macro that can use the attributes to load some additional macros and combine them into a pipeline. In the simplest case, you can simply imagine something like this:


Suppose we have some macro TextMessagethat displays traits for a given type ToStringand FromStrusing some codec as a textual representation. Different types of messages may have a different codec, and their complete list may expand over time, and each codec may have its own unique set of attributes.


#[derive(Debug, Serialize, Deserialize, PartialEq, TextMessage)]
#[text_message(codec = "serde_json", params(pretty))]
struct FooMessage {
    name: String,
    description: String,
    value: u64,
}

, . libloading, IDE. , syn quote, , .
WASM , . .



, watt WASM , . watt proc-macro2, . , darling proc-macro2, .


, proc-macro2, WASM
- . , wasmtime, bytecodealliance, , Mozilla, Intel RedHat. wasmtime , , ,



Disclaimer: wasmtime ,
, , . WASM , .


!


, , , WASM , . .


:


pub fn implement_codec(input: TokenStream) -> TokenStream;

, , . TokenStream , :


pub fn implement_codec(input: &str) -> String;

, ,
, :


, , ! WASM , , , , .
, , , . , , , , , . : .


#[no_mangle]
pub unsafe extern "C" fn toy_alloc(size: i32) -> i32 {
    let size_bytes: [u8; 4] = size.to_le_bytes();
    let mut buf: Vec<u8> = Vec::with_capacity(size as usize + size_bytes.len());
    //  4  -     ,      
    // .
    buf.extend(size_bytes.iter());
    to_host_ptr(buf)
}

unsafe fn to_host_ptr(mut buf: Vec<u8>) -> i32 {
    let ptr = buf.as_mut_ptr();
    //      ,   "",  
    //       .
    mem::forget(buf);
    ptr as *mut c_void as usize as i32
}

#[no_mangle]
pub unsafe extern "C" fn toy_free(ptr: i32) {
    let ptr = ptr as usize as *mut u8;
    let mut size_bytes = [0u8; 4];
    ptr.copy_to(size_bytes.as_mut_ptr(), 4);
    //       ,    
    //  .
    let size = u32::from_le_bytes(size_bytes) as usize;
    //  ,     ""   `to_host_ptr`  
    //          
    //    .
    Vec::from_raw_parts(ptr, size, size);
}

, , wasm_bindgen.


WASM . , .


#[no_mangle]
pub unsafe extern "C" fn implement_codec(
    item_ptr: i32,
    item_len: i32,
) -> i32 {
    let item = str_from_raw_parts(item_ptr, item_len);
    let item = TokenStream::from_str(&item).expect("Unable to parse item");

    //     ,   .
    // `fn(item: TokenStream) -> TokenStream`
    let tokens = codec::implement_codec(item);
    let out = tokens.to_string();

    to_host_buf(out)
}

pub unsafe fn str_from_raw_parts<'a>(ptr: i32, len: i32) -> &'a str {
    let slice = std::slice::from_raw_parts(ptr as *const u8, len as usize);
    std::str::from_utf8(slice).unwrap()
}

, WASM .



pub struct WasmMacro {
    module: Module,
}

impl WasmMacro {
    //    .
    pub fn from_file(file: impl AsRef<Path>) -> anyhow::Result<Self> {
        //    WASM ,    .
        let store = Store::default();
        let module = Module::from_file(&store, file)?;
        Ok(Self { module })
    }

    //     `fun`   ,   
    //     TokenStream  .
    pub fn proc_macro_derive(
        &self,
        fun: &str,
        item: TokenStream,
    ) -> anyhow::Result<TokenStream> {
        //    ,   TokenStream  ,
        //      .
        let item = item.to_string();

        //    ,     .
        let instance = Instance::new(&self.module, &[])?;
        //      ,     
        //   `implement_codec`.
        let proc_macro_attribute_fn = instance
            .get_export(fun)
            .ok_or_else(|| anyhow!("Unable to find `{}` method in the export table", fun))?
            .func()
            .ok_or_else(|| anyhow!("export {} is not a function", fun))?
            .get2::<i32, i32, i32,>()?;

        //      WASM   
        // ,      .
        let item_buf = WasmBuf::from_host_buf(&instance, item);
        //            
        let (item_ptr, item_len) = item_buf.raw_parts();
        //          
        //      TokenStream. 
        let ptr = proc_macro_attribute_fn(item_ptr, item_len).unwrap();
        //       .
        let res = WasmBuf::from_raw_ptr(&instance, ptr);
        let res_str = std::str::from_utf8(res.as_ref())?;
        //       TokenStream   .
        TokenStream::from_str(&res_str)
            .map_err(|_| anyhow!("Unable to parse token stream"))
    }
}

WasmBuf : ,
, toy_alloc.
, .


struct WasmBuf<'a> {
    //    ,  ,    .
    offset: usize,
    //    .
    len: usize,
    //    ,    
    instance: &'a Instance,
    //    ,    .
    memory: &'a Memory,
}

const WASM_PTR_LEN: usize = 4;

impl<'a> WasmBuf<'a> {
    //    :     `toy_alloc`
    //    .
    pub fn new(instance: &'a Instance, len: usize) -> Self {
        let memory = Self::get_memory(instance);
        //       .
        let offset = Self::toy_alloc(instance, len);

        Self {
            offset: offset as usize,
            len,
            instance,
            memory,
        }
    }

    //      ,     ,
    //      ,      .
    pub fn from_host_buf(instance: &'a Instance, bytes: impl AsRef<[u8]>) -> Self {
        let bytes = bytes.as_ref();
        let len = bytes.len();

        let mut wasm_buf = Self::new(instance, len);
        //        .
        wasm_buf.as_mut().copy_from_slice(bytes);
        wasm_buf
    }

    //       ,    
    // .            
    //         .
    //      `toy_alloc`  ,    4
    //     .
    pub fn from_raw_ptr(instance: &'a Instance, offset: i32) -> Self {
        let offset = offset as usize;
        let memory = Self::get_memory(instance);

        let len = unsafe {
            //      .
            let buf = memory.data_unchecked();

            let mut len_bytes = [0; WASM_PTR_LEN];
            //      .
            len_bytes.copy_from_slice(&buf[offset..offset + WASM_PTR_LEN]);
            u32::from_le_bytes(len_bytes)
        };

        Self {
            offset,
            len: len as usize,
            memory,
            instance,
        }
    }

    //         .
    //     ,       4 .

    pub fn as_ref(&self) -> &[u8] {
        unsafe {
            let begin = self.offset + WASM_PTR_LEN;
            let end = begin + self.len;

            &self.memory.data_unchecked()[begin..end]
        }
    }

    pub fn as_mut(&mut self) -> &mut [u8] {
        unsafe {
            let begin = self.offset + WASM_PTR_LEN;
            let end = begin + self.len;

            &mut self.memory.data_unchecked_mut()[begin..end]
        }
    }    
}

, .


impl Drop for WasmBuf<'_> {
    fn drop(&mut self) {
        Self::toy_free(self.instance, self.len);
    }
}


,
WASM , .


#[proc_macro_derive(TextMessage, attributes(text_message))]
pub fn text_message(input: TokenStream) -> TokenStream {
    let input: DeriveInput = parse_macro_input!(input);

    let attrs = TextMessageAttrs::from_raw(&input.attrs)
        .expect("Unable to parse text message attributes.");

    //        codecs,   
    //    . 
    let codec_dir = Path::new(&std::env::var("CARGO_MANIFEST_DIR")
        .unwrap())
        .join("codecs");
    let plugin_name = format!("{}_text_codec.wasm", attrs.codec);
    let codec_path = codec_dir.join(plugin_name);

    let wasm_macro = WasmMacro::from_file(codec_path)
        .expect("Unable to load wasm module");

    wasm_macro
        .proc_macro_derive(
            "implement_codec",
            input.into_token_stream().into(),
        )
        .expect("Unable to apply proc_macro_attribute")
}

. , WASM , .


#[derive(Debug, Serialize, Deserialize, PartialEq, TextMessage)]
//   ,  WASM      .
#[text_message(codec = "serde_json", params(pretty))]
struct FooMessage {
    name: String,
    description: String,
    value: u64,
}

fn main() {
    let msg = FooMessage {
        name: "Linus Torvalds".to_owned(),
        description: "The Linux founder.".to_owned(),
        value: 1,
    };

    let text = msg.to_string();
    println!("{}", text);
    let msg2 = text.parse().unwrap();

    assert_eq!(msg, msg2);
}


While this is more like a trolley from a loaf of bread, but on the other hand it is a small but wonderful demonstration of the principle itself. Such macros become open to expansion. We no longer need to rewrite the original procedural macro to change or expand its behavior. And if you use the registry of modules for WASM, then you can distribute such modules like cargo crates.


All Articles