How WebAssembly Modules Safely Exchange Data

1133

By Marco Fioretti

The WebAssembly binary format (Wasm) has been developed to allow software written in any language to “compile once, run everywhere”, inside web browsers or stand-alone virtual machines (runtimes) available for any platform, almost as fast as code directly compiled for those platforms. Wasm modules can interact with any host environment in which they run in a really portable way, thanks to the WebAssembly System Interface (WASI).

That is not enough, though. In order to be actually usable without surprises in as many scenarios as possible, Wasm executable files need at least two more things. One is the capability to interact directly not just with the operating system, but with any other program of the same kind. The way to do this with Wasm is called “module linking”, and will be the topic of the next article of this series. The other feature, that is a prerequisite for module linking to be useful, is the capability to exchange data structures of any kind, without misunderstandings or data loss.

What happens when Wasm modules exchange data?

Since it is only a compilation target, the WebAssembly format provides only low-level data types that aim to be as close to the underlying machine as possible. It is this choice that provides highly portable, high performing modules, while leaving programmers to write software in whatever language they want. The burden of mapping complex data structures in that language to native Wasm data types is left to software libraries, and to the compilers that use them.

The problem here is that in order to be efficient, the first generation of Wasm syntax and WASI do not natively support strings and other equally basic data types. Therefore, there is no intrinsic guarantee that, for example, a Wasm module compiled from Python sources and another from Rust ones will have exactly the same concept of “string” in every circumstance where string may be used.

The consequence is that, if Wasm modules compiled from different languages want to exchange more complex data structures, something important may be, so to speak, “lost in translation” every time some data goes from one module to another. Concretely, this prevents both direct embedding of Wasm modules into generic applications and direct calls from Wasm modules to external software.

In order to understand the nature of the problem, it is useful to look at how such data are passed around in first-generation Wasm and WASI modules.

The original way for WebAssembly to communicate with JavaScript and C programs is to simulate things like strings by manually managing chunks of memory.

For example, in the function path_open, a string is passed as a pair of integer numbers (i32) that represent the offset and, respectively, the length of that string in the linear memory reserved to a Wasm module. This would already be bad enough when, to mention just the simplest and most frequent cases, different character encodings or Garbage Collection (GC) are used. To make things worse, WASI modules that exchange strings would be forced to access each other’s memory, making this way of working far from optimal for both performance and security reasons.

Theoretically, Wasm modules that want to exchange data may also use traditional, JavaScript-compatible data passing mechanisms like WebIDL. This is the Interface Description Language used to describe all the components, including of course data types for any Web application programming interface (API).

In practice however, this would not solve anything. First because Web IDL functions can accept, that is pass back to the Wasm module that called them, higher level constructs than WebAssembly would understand. Second because using WebAssembly means exchanging data not directly but through ECMAScript Bindings, which have their own complexities and performance penalties. Summarizing, certain tricks work today, but not in all cases, and are by no means future-proof.

The solution: Interface and Reference Types

The real solution to all the problems mentioned above is to extend both the Wasm binary format and WASI in ways that:

directly support more complex data structures like strings or lists
allow Wasm modules to statically type-check the corresponding variables, and exchange them directly, but without having to share their internal linear memory.

There are two specifications that are being deployed just for this purpose. The main one is simply called Interface Types and its companion Reference Types.

Both Types rely on lower level features already added to the original Wasm core, namely “multi-value” and multi-memory support. The first extension allows Wasm functions to return an arbitrary number of values, instead of just one as before, and Wasm instruction sequences to handle an arbitrary number of stack values. The other lets a whole Wasm module, or single functions, use multiple memories at the same time, which is good for a whole lot of reasons besides exchanging variables.

Building on these features, Interface Types define strings and other “high-level” data structures, in ways that any unmodified Wasm runtime can use. Reference Types complete the picture, specifying how Wasm applications must actually exchange those data structures with external applications.

The specifications are not fully completed yet. Interface Types can exchange values, but not handles to resources and buffers, which would be required, for example, to “read a file and write directly into a buffer”.

Working together however, all the features described here already enable Wasm modules and WASI interfaces to handle and exchange most complex data structures efficiently, without corrupting them and regardless of what language they were used in, before compiling to Wasm.

The post How WebAssembly Modules Safely Exchange Data appeared first on Linux Foundation – Training.