backgroundradial

Improving WebAssembly load times with Zero-Copy deserialization

Explore how Wasmer's 4.2 release utilizes zero-copy deserialization to improve module load times by up to 50%. Learn about the role of the rkyv library and how we achieved significant performance gains without compromising security. How we make Rust load faster in 2023

arshia001 avatar
arshia001
Arshia Ghafoori

Software Engineer

engineering

September 7, 2023

arrowBack to articles
Post cover image

Wasmer is now even faster 馃殌

Wasmer's 4.2 release introduces Zero-Copy module deserialization, improving module load times by up to 50%.

What is zero-copy deserialization?

To get any useful data out of a file, most serialization formats require that the file be parsed and the data moved to a different location in memory (possibly after a transformation pass.)

For example, to get binary data out of this JSON file:

{
    "myData": "V2FzbWVyIGlzIGJsYXppbmcgZmFzdCE="
}

you have to first parse the entire file, identifying where the value resides. Then, a base64-to-binary pass needs to parse the value itself to turn it into a usable Vec<u8>.

That is not blazing fast.

We can do better.

Enter rkyv

rkyv is a (de)serialization library for rust that, instead of converting data to a specific format, just stores it almost as it exists in the application's memory so that simply loading the data into memory again and reinterpreting it as a pointer to a specific struct type (called an archive, more on that below) gives you a usable instance.

The approach is quite interesting and you can read up on the details in their docs.

Using rkyv to improve module load times

Wasmer has always used rkyv to store compiled modules. However, there is a small problem that didn't let us benefit completely from rkyv's full speed. Consider this struct:

#[derive(rkyv::Archive, rkyv::Serialize, rkyv::Deserialize)]
struct Person {
    name: String,
    age: u8,
}

In rkyv, each struct gets an Archived equivalent generated for it, which would look something like this:

struct ArchivedPerson {
    name: Archived<String>,
    age: Archived*<u8>,
}

When you read a byte array into memory (or use a memory-mapped file,) you can feed it to rkyv's archived_value and get an ArchivedPerson back almost instantly. Or, if you want to be careful, you can use check_archived_value to validate the structure of the data. This takes a bit longer, but still much less than, say, deserializing JSON.

The problem is that to go from ArchivedPerson to Person, you have to perform a deserialize step which copies all of the data anyway and loses most of the benefits of using rkyv. But you will find that you need to do that anyway; consider this function:

fn greet(person: &Person) -> String {
    format!("Hello, {}!", person.name)
}

It takes a &Person, not a &ArchivedPerson. Even though the two structs have the (almost) same fields with the same names. If this was a dynamically typed language, passing a &ArchivedPerson to greet may very well have worked, but rust's strict type system (which I'm infinitely grateful for 99.99% of the time) doesn't let us do that, and that is why Wasmer was doing a deserialization pass after reading the archives anyway.

One could always implement a separate greet_archive function, but that quickly gets out of hand when the code is moderately complex. Instead, we need a way to make the type system agree that Person and ArchivedPerson are, indeed, very similar, to the point that they can be used interchangeably. And what better way to do that than to use traits:

trait PersonLike {
    fn name(&self) -> &str;
    fn age(&self) -> u8;
}

Now we just have to implement this trait for both structs. It's still a bit of work, but at least you don't have to write all the code twice.

impl PersonLike for Person {
    fn name(&self) -> &str {
        self.name.as_str()
    }
    fn age(&self) -> u8 {
        self.age
    }
}

impl PersonLike for ArchivedPerson {
    fn name(&self) -> &str {
        // The archived representation of a string also has an as_str method
        self.name.as_str()
    }
    fn age(&self) -> u8 {
        // Primitives are archived as themselves
        self.age
    }
}

You'll notice that we can't move anything out of the struct, since the fields aren't the same type after all; not to mention that the data in ArchivedPerson is actually just a reference to the original byte array and cannot ever be moved. Also, we're limited to the lowest-common-denominator of both structs. As it turns out, this is more than enough for our scenario, and lets us reimplement greet as:

fn greet<T: PersonLike>(person: &T) -> String {
    format!("Hello, {}!", person.name())
}

The good thing about this approach is that there is no additional runtime overhead. Monomorphization eliminates the generic, giving us two implementations of greet that are just as fast the one we had before.

We took the same idea outlined here, and applied it to the code that reads and loads artifacts. As it turns out, most of the data contained in a serialized artifact (the files you'll find in your ~/.wasmer/cache/compiled directory, as well as the output from running wasmer compile) is just byte arrays, and those can stay right where they are until they are ready to be loaded into the program's memory as executable code or memory initialization data.

So, how much faster are we?

Good question! We took python, php and everyone's favorite cowsay and tried loading them a bunch of times, and took the average of the results. The numbers are below. Times are in milliseconds and speedup is calculated as 1 - (After / Before).

Module NameBeforeAfterSpeedup
cowsay2.651.5740%
python43.0921.5350%
php141.0574.0347%

We're seeing a 40 to 50 percent speedup in module load times, which is considerable if you ask me. Huge win for monomorphization and Rust and, by extension, Wasmer!

Security Considerations

It is important to consider that, when loading a compiled artifact that you do not trust, there are many, many things that can go wrong; you could be loading a virus for all you know. wasm modules on the other hand, when compiled by Wasmer, cannot break their sandboxing.

While it's true that skipping the deserialization pass means that we won't discover errors in module structure that would have been caught otherwise, you really only want to load an artifact if you know where it came from anyway (i.e. you compiled it yourself), so we don't believe this change creates additional security risks.

Conclusion

We're always working to make Wasmer even faster than it is. This change shaved a few more milliseconds off. This may not be a lot, but you'll notice the effects if your application loads as many modules as Wasmer Edge does!

As we continue to enhance Wasmer's performance and functionality, your support and feedback are invaluable to us. To stay updated on our latest innovations and contribute to our projects, please visit our GitHub repository and consider giving us a star 猸愶笍.

About the Author

Arshia is the leader behind WinterJS, working also in WASIX and the Wasmer Runtime

Arshia Ghafoori avatar
Arshia Ghafoori
Arshia Ghafoori

Software Engineer

Read more

engineeringruntimewasmer runtime

Wasmer 2.2: Major Singlepass Improvements

Syrus AkbaryFebruary 28, 2022

engineeringruntimewasmer runtime

Wasmer 2.3

Syrus AkbaryJune 7, 2022

engineeringruntimewasmer runtime

Wasmer 4.1

Syrus AkbaryJuly 17, 2023

engineeringruntimewasmer runtime

Wasmer 3.2

Syrus AkbaryApril 18, 2023