Improving WebAssembly load times with Zero-Copy deserialization
Explore how Wasmer's 4.2 release utilizes zero-copy deserialization to improve module load times by up to 50%. Learn about the role of the rkyv library and how we achieved significant performance gains without compromising security. How we make Rust load faster in 2023
Arshia Ghafoori
Software Engineer
September 7, 2023
Wasmer is now even faster 馃殌
Wasmer's 4.2 release introduces Zero-Copy module deserialization, improving module load times by up to 50%.
What is zero-copy deserialization?
To get any useful data out of a file, most serialization formats require that the file be parsed and the data moved to a different location in memory (possibly after a transformation pass.)
For example, to get binary data out of this JSON file:
{
"myData": "V2FzbWVyIGlzIGJsYXppbmcgZmFzdCE="
}
you have to first parse the entire file, identifying where the value resides. Then, a base64-to-binary pass needs to parse the value itself to turn it into a usable Vec<u8>
.
That is not blazing fast.
We can do better.
Enter rkyv
rkyv is a (de)serialization library for rust that, instead of converting data to a specific format, just stores it almost as it exists in the application's memory so that simply loading the data into memory again and reinterpreting it as a pointer to a specific struct type (called an archive, more on that below) gives you a usable instance.
The approach is quite interesting and you can read up on the details in their docs.
Using rkyv
to improve module load times
Wasmer has always used rkyv
to store compiled modules. However, there is a small problem that didn't let us benefit completely from rkyv's full speed. Consider this struct:
#[derive(rkyv::Archive, rkyv::Serialize, rkyv::Deserialize)]
struct Person {
name: String,
age: u8,
}
In rkyv
, each struct gets an Archived
equivalent generated for it, which would look something like this:
struct ArchivedPerson {
name: Archived<String>,
age: Archived*<u8>,
}
When you read a byte array into memory (or use a memory-mapped file,) you can feed it to rkyv's archived_value
and get an ArchivedPerson
back almost instantly. Or, if you want to be careful, you can use check_archived_value
to validate the structure of the data. This takes a bit longer, but still much less than, say, deserializing JSON.
The problem is that to go from ArchivedPerson
to Person
, you have to perform a deserialize step which copies all of the data anyway and loses most of the benefits of using rkyv
. But you will find that you need to do that anyway; consider this function:
fn greet(person: &Person) -> String {
format!("Hello, {}!", person.name)
}
It takes a &Person
, not a &ArchivedPerson
. Even though the two structs have the (almost) same fields with the same names. If this was a dynamically typed language, passing a &ArchivedPerson
to greet
may very well have worked, but rust's strict type system (which I'm infinitely grateful for 99.99% of the time) doesn't let us do that, and that is why Wasmer was doing a deserialization pass after reading the archives anyway.
One could always implement a separate greet_archive
function, but that quickly gets out of hand when the code is moderately complex. Instead, we need a way to make the type system agree that Person
and ArchivedPerson
are, indeed, very similar, to the point that they can be used interchangeably. And what better way to do that than to use traits:
trait PersonLike {
fn name(&self) -> &str;
fn age(&self) -> u8;
}
Now we just have to implement this trait for both structs. It's still a bit of work, but at least you don't have to write all the code twice.
impl PersonLike for Person {
fn name(&self) -> &str {
self.name.as_str()
}
fn age(&self) -> u8 {
self.age
}
}
impl PersonLike for ArchivedPerson {
fn name(&self) -> &str {
// The archived representation of a string also has an as_str method
self.name.as_str()
}
fn age(&self) -> u8 {
// Primitives are archived as themselves
self.age
}
}
You'll notice that we can't move anything out of the struct, since the fields aren't the same type after all; not to mention that the data in ArchivedPerson
is actually just a reference to the original byte array and cannot ever be moved. Also, we're limited to the lowest-common-denominator of both structs. As it turns out, this is more than enough for our scenario, and lets us reimplement greet
as:
fn greet<T: PersonLike>(person: &T) -> String {
format!("Hello, {}!", person.name())
}
The good thing about this approach is that there is no additional runtime overhead. Monomorphization eliminates the generic, giving us two implementations of greet that are just as fast the one we had before.
We took the same idea outlined here, and applied it to the code that reads and loads artifacts. As it turns out, most of the data contained in a serialized artifact (the files you'll find in your ~/.wasmer/cache/compiled
directory, as well as the output from running wasmer compile
) is just byte arrays, and those can stay right where they are until they are ready to be loaded into the program's memory as executable code or memory initialization data.
So, how much faster are we?
Good question! We took python, php and everyone's favorite cowsay and tried loading them a bunch of times, and took the average of the results. The numbers are below. Times are in milliseconds and speedup is calculated as 1 - (After / Before)
.
Module Name | Before | After | Speedup |
---|---|---|---|
cowsay | 2.65 | 1.57 | 40% |
python | 43.09 | 21.53 | 50% |
php | 141.05 | 74.03 | 47% |
We're seeing a 40 to 50 percent speedup in module load times, which is considerable if you ask me. Huge win for monomorphization and Rust and, by extension, Wasmer!
Security Considerations
It is important to consider that, when loading a compiled artifact that you do not trust, there are many, many things that can go wrong; you could be loading a virus for all you know. wasm
modules on the other hand, when compiled by Wasmer, cannot break their sandboxing.
While it's true that skipping the deserialization pass means that we won't discover errors in module structure that would have been caught otherwise, you really only want to load an artifact if you know where it came from anyway (i.e. you compiled it yourself), so we don't believe this change creates additional security risks.
Conclusion
We're always working to make Wasmer even faster than it is. This change shaved a few more milliseconds off. This may not be a lot, but you'll notice the effects if your application loads as many modules as Wasmer Edge does!
As we continue to enhance Wasmer's performance and functionality, your support and feedback are invaluable to us. To stay updated on our latest innovations and contribute to our projects, please visit our GitHub repository and consider giving us a star 猸愶笍.
About the Author
Arshia is the leader behind WinterJS, working also in WASIX and the Wasmer Runtime
Arshia Ghafoori
Software Engineer
What is zero-copy deserialization?
Enter `rkyv`
Using `rkyv` to improve module load times
So, how much faster are we?
Security Considerations
Conclusion
Read more
engineeringruntimewasmer runtime
Wasmer 2.2: Major Singlepass Improvements
Syrus AkbaryFebruary 28, 2022