How and Why to Link WebAssembly Modules

1469

By Marco Fioretti

WebAssembly, or Wasm for brevity, is a Web-optimized executable software format, designed to give programmers the greatest possible flexibility. Wasm binary modules can be compiled once, and then safely run anywhere, alone or embedded in other applications. In practice, Wasm needs at least three key components to keep that promise. Two of them, already presented in this series, are the WebAssembly System Interface (WASI), and Wasm Interface Types.

WASI gives Wasm modules standard, language-independent ways to interact with any host environment in which they may land. The Interface Types, instead, are equally standardized definitions, but for all kinds of software variables. By using them, Wasm modules can pass complex data structures to each other without risking corrupting them, even if they were written by independent programmers in very different source languages. The other main piece of this puzzle is called module linking. This is what allows distinct binary files to interact directly – for example using each other’s functions – as if they were both written as different sections of the same source code and then compiled together.

The pros and cons of linking Wasm modules

The first reason to link Wasm modules is one that is always valid in every area of programming, which is reuse. If a library of, say, mathematical or networking functions can be written (and maintained!) once, but in a way that allows thousands of programmers to use it with little or no effort, everybody wins.

The other reason is even simpler, but particularly important for a format like Wasm: speed. At least for the foreseeable future, most Wasm modules will be downloaded by some remote server, possibly on a slow mobile link, to be executed on the fly. In all such cases, every extra second spent downloading and preparing code can make a difference. If a large Wasm application is split in separate, interlinkable modules, its host can download only the ones its own users need, and only when they actually need them. To further reduce downloads, frequently requested modules can even be cached locally.

Of course, there can be too much of a good thing. Using many modules, especially from many independent sources, speeds up software development, but can make its maintenance more complex in the long run. At the same time, on any stable network, downloading and linking several modules takes, almost by definition, more time than getting just one blob of code that does exactly the same thing. In addition, function calls between linked Wasm modules “can be slower than function calls within one module”. Overall, all these factors may lead to a real-world performance hit of a few percentage points. In many cases, this will be a very reasonable price to pay.

The Wasm way to link modules

Independently developed Wasm modules can always be “linked” in the same way used for countless software applications, which is at compile time. This produces one executable file that has all the desired features and is ready to run inside any Wasm/WASI compliant virtual machine. Besides depending on the specific languages and toolchain used to generate each executable file, however, this static linking is almost the opposite of the desired result: a “portable, host- and language-independent ecosystem” of WebAssembly modules that are composable as needed after, not before downloading them, and regardless of where they came from.

A first, if small step in this direction consists of using the already mentioned Wasm Interface Types: they can, in fact, let different Wasm modules exchange copies of their data structures, without actually sharing them but as if they were parts of the same program. This limited form of cooperation among modules is called “Shared-Nothing Linking”.

The kind of linking that is really consistent with the core Wasm philosophy, however, is the one that happens only when and where it is really needed, does not waste resources and, above all, doesn’t put unnecessary constraints on the providers of Wasm modules. The linking, that is, should happen on the host that actually needs it, but without requiring any preparation for the modules that are linked, or any application-specific customization for the programmers who wrote them.

This means that Wasm binaries should use some virtualization technique to declare and import the other modules (or parts of them) that they want to link. This mechanism, called “link-time virtualization” would eventually allow so-called “Shared-Everything Dynamic Linking”, in which all the linked modules could directly share their memory and data tables, without duplications. The low-level, gory details of this approach are described in the corresponding section of the official Explainer for linking Wasm modules linking. Here, we only mention two of the general properties, or constraints, that every “pure-Wasm” linking solution should include.

The first one is the “Principle of Least Authority”, by which every Wasm module must always expose to its host, or demand from it, only the smallest possible subset of capabilities that it needs to do its job. The other is, to put it simply, that linking modules should not make Garbage Collection in Wasm more complicated than it already is.

In practice: Emscripten, JavaScript APIs and WAPM

The most elementary tools to create and use from scratch linked Wasm modules are functions like dlopen() in C or C++ or, in JavaScript, the equivalent WebAssembly APIs. With the proper instructions, the Emscripten toolchain can add, to the glue logic that makes JavaScript virtual machines load and run Wasm code, a dynamicLibraries array that lists all the modules that should be downloaded and then linked. To study or use complete linkable modules instead, check out the official registry of the WebAssembly Package Manager (WAPM), an open source tool whose purpose is exactly to facilitate the publication and installation of such modules.

The post How and Why to Link WebAssembly Modules appeared first on Linux Foundation – Training.