Encode modules with variable-length integers#2322
Conversation
Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of bytecodealliance#2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
Subscribe to Label Actioncc @peterhuene DetailsThis issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
This commit reduces the size of `InstructionAddressMap` from 24 bytes to 8 bytes by dropping the `code_len` field and reducing `code_offset` to `u32` instead of `usize`. The intention is to primarily make the in-memory version take up less space, and the hunch is that the `code_len` is largely not necessary since most entries in this map are always adjacent to one another. The `code_len` field is now implied by the `code_offset` field of the next entry in the map. This isn't as big of an improvement to serialized module size as bytecodealliance#2321 or bytecodealliance#2322, primarily because of the switch to variable-length encoding. Despite this though it shaves about 10MB off the encoded size of the module from bytecodealliance#2318
This commit reduces the size of `InstructionAddressMap` from 24 bytes to 8 bytes by dropping the `code_len` field and reducing `code_offset` to `u32` instead of `usize`. The intention is to primarily make the in-memory version take up less space, and the hunch is that the `code_len` is largely not necessary since most entries in this map are always adjacent to one another. The `code_len` field is now implied by the `code_offset` field of the next entry in the map. This isn't as big of an improvement to serialized module size as bytecodealliance#2321 or bytecodealliance#2322, primarily because of the switch to variable-length encoding. Despite this though it shaves about 10MB off the encoded size of the module from bytecodealliance#2318
This commit reduces the size of `InstructionAddressMap` from 24 bytes to 8 bytes by dropping the `code_len` field and reducing `code_offset` to `u32` instead of `usize`. The intention is to primarily make the in-memory version take up less space, and the hunch is that the `code_len` is largely not necessary since most entries in this map are always adjacent to one another. The `code_len` field is now implied by the `code_offset` field of the next entry in the map. This isn't as big of an improvement to serialized module size as bytecodealliance#2321 or bytecodealliance#2322, primarily because of the switch to variable-length encoding. Despite this though it shaves about 10MB off the encoded size of the module from bytecodealliance#2318
This commit reduces the size of `InstructionAddressMap` from 24 bytes to 8 bytes by dropping the `code_len` field and reducing `code_offset` to `u32` instead of `usize`. The intention is to primarily make the in-memory version take up less space, and the hunch is that the `code_len` is largely not necessary since most entries in this map are always adjacent to one another. The `code_len` field is now implied by the `code_offset` field of the next entry in the map. This isn't as big of an improvement to serialized module size as #2321 or #2322, primarily because of the switch to variable-length encoding. Despite this though it shaves about 10MB off the encoded size of the module from #2318
|
To be clear this is just enabling variable length encoding for integers, right? It is not switching to a delta encoding (which would ensure that essentially all addresses are 1-byte encoded) correct? |
|
Ah yes indeed, a delta encoding might make it even more further compact. I've so far shied away from big changes like that though. I think we may want to do it eventually, but it trades-off in-memory footprint for lookup speed since we can no longer quickly do a random lookup for a particular trapping pc. Not that we necessarily need that to be too too fast. |
|
I'm only advocating for delta encoding in the serialized format, not the in-memory representation. |
I think it'd be good to try not to have those two deviate too much: ideally we should at some point be able to mmap the on-disk representation entirely or almost entirely instead of having to read and deserialize it. There clearly are size/speed tradeoffs to be had here, so that might not be the only thing we support, but I hope it can become a thing we support :) |
Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
This commit reduces the size of `InstructionAddressMap` from 24 bytes to 8 bytes by dropping the `code_len` field and reducing `code_offset` to `u32` instead of `usize`. The intention is to primarily make the in-memory version take up less space, and the hunch is that the `code_len` is largely not necessary since most entries in this map are always adjacent to one another. The `code_len` field is now implied by the `code_offset` field of the next entry in the map. This isn't as big of an improvement to serialized module size as #2321 or #2322, primarily because of the switch to variable-length encoding. Despite this though it shaves about 10MB off the encoded size of the module from #2318
Update
Module::{serialize,deserialize}to use variable-length integerswith
bincodeto make the output artifacts smaller. Locally thisreduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.