WebAssembly
WebAssembly is a relatively new standard to support architecture independent executables, primarily on Web pages. But it is a much more than that, and the way it has been specified makes it uniquely qualified for a language, OS and machine independent executable format.
Rationale
What's any Web has to do with OS kernels, you may ask. The truth is, the name WebAssembly (or WASM in short) is quite misleading, as it's a very well specified bytecode format with a minimal interpreting environment, which does not require any Web-related technologies at all.
A kernel already should have an executable format loader and parser (like an ELF parser), but that must use native CPU instructions. Adding a small bytecode interpreter to that would allow any hobby OS to execute machine and OS independent binaries, which is a great benefit. It's similar how different Android devices use Java for executing the proprietary APK system.
Unlike Java and Dalvik, interpreting WebAssembly binary is much easier, has no legal concerns, and there are many already written libraries to do so in many languages (most notably C and Rust).
Comparison to Other Bytecode Formats
Lua
Lua is a very popular scripting language. However it's compiler, interpreter and execution environment is mixed into a single library. That's good for integrating with user space applications, but makes it difficult to embed into kernels (which would only need the bytecode interpreter part). The bytecode format is not really standardized, although seems quite stable.
Python
As strange as it seems, there's no such thing as Python bytecode standard. Therefore it is pretty impossible to embed a Python interpreter into a kernel. There are interpreter libraries which you could link in, but just as the format, they are constantly changing, updated, and limited to a specific version of Python.
Java
Java Has a very well defined bytecode format, but unfortunately there's no simple bytecode interpreter library. Both the official JVM and it's OpenSource counterparts are huge bloated software, not really designed for embedding into kernels. Even if there were a simple interpreter library, Java is licensed by Oracle and use is subject to certain legal terms. To clearify, Android does not use the Java bytecode at all, it's interpreting Dalvik bytecode which is compiled from Java source.
WebAssembly
Unlike the rest, WebAssembly is not tied to any programming language, instead it's an open specification made by W3C. Therefore it is possible to compile C, C++, Rust, Pascal or even Basic into WASM bytecode. Also the execution environment is separated, and very well defined, therefore there are many implementations you can choose from.
One of the most known C/C++ compiler that produces WASM bytecode is EmScripten, which is built on top of the LLVM compiler architecture. There's a reference bytecode interpreter provided by W3C, which is in OCAML and aimed at simplicity (and not speed).
The WebAssembly documentation an specification allows unlimited languages and interpreters, and there are no licensing concerns involved. You are free to use any WASM compiler for any language and implement your own interpreter in your kernel.
WebAssembly Interpreter Parlance
- WASM: a file which contains binary WebAssembly bytecode, also the name of the reference interpreter.
- WAT: WebAssembly Text source format, is a plain text representation of the bytecode (see Binaryen wasm-at tool).
- WABT: WebAssembly Binary Toolkit, equivalent of binutils. Includes assembler, objdump and other tools (see WABT).
- MVP: Minimal Viable Product, which means the smallest WASM interpreter (nothing Web related required or included). Defined in WebAssembly Core Specification
- WASI: WebAssembly System Interface, this defines an ABI and a set of functions how the bytecode interacts with the OS and non-WASM libraries.
- EmScripten: a WASI module, functions like memory allocation, everything that the EmScripten compiler may generate code for. Very simply speaking it's WebAssembly's current libc.
- Web API: a standardized way to integrate WebAssembly into webpages, not our concern right now. Note that Web API is optional for WebAssembly.
Linking
WebAssembly has a very clean module linking interface, the WASI. Also defined for OS-independent, easy non-Web integration. There're official POSIX and SDL modules (more like library wrappers) for WebAssembly, but the number of unofficial, third-party libraries are growing day by day.
Linkage for JavaScript, C/C++, Rust, Python and many other languages are now available and well tested. For kernels, you'll be interested in the C or Rust linkage mostly. You can find and incomplete and growing list here.
Right now EmScripten module is used for an OS interface, but this is going to change pretty soon. Big effort is put into WASI to provide a fully featured libc interface for WASM bytecode using musl as reference. Once WASI specs got frozen, all you'll need to implement/include is a WASI interpreter.
Binary Format
A WASM file starts with a four byte magic, "asm\001", where the last character denotes format version. As such, it's pretty easy to identify.
After the header come several sections, one of which contains the bytecode. The binutils equivalent WABT includes a tool called wasm-objdump to dump these and disassemble bytecode. The details and encodings are specified by the WebAssembly Core Specification, section binary format.
See Also
Specifications
- WebAssembly Core Specification
- WASI, ABI specification to run WASM outside of Web browsers
- WASI overview
- CommonWA minimal specification of the standard API for non-Web usermode environments
- WebAssembly on Wikipedia
- WebAssembly on Mozilla Development Network
- Rust and WebAssembly
Compilers
- EmScripten, a reference C/C++ compiler that produces WASM bytecode
- LLVM good description on how to use LLVM's default wasm target without the EmScripten bloat
- Asterius, a Haskell compiler that produces WASM bytecode
- wasm-pack a Rust compiler that produces WASM bytecode
- JWebAssembly, a Java bytecode to WebAssembly bytecode converter
- Binaryen, low level tools for WebAssembly, provided by W3C
- Cranelift, a WASM to native code compiler in Rust
Interpeters
- Reference Interpreter in OCaml, provided by W3C
- libwasmint Interpreter in C++, a library designed for embedding, provided by W3C
- wasm-interp Interpreter in C++, part of WABT, more complete than libwasmint
- wasmtime yet another wasm interpreter, this time in Rust with WASI support (this is the reference implementation for WASI)
- wac, wax, wace Interpreter in C, this provides MVP, WASI, EmScripten-compatibility; probably the easiest to embed in a kernel
- wasm3 Interpreter in C, aims for high performance and WASI compatibility
- wasmi, Interpreter in Rust
- pywasm Interpreter in Python
- life Interpreter in Go
- wasmjit a Linux kernel module in C to run wasm bytecode
- cervus WebAssembly subsystem for Linux in C
- PWASM easy to integrate WASM execution library (uses DynASM JIT) written in C
Examples
- kwast a kernel written in Rust which runs wasm bytecode in userspace (under heavy development as of early 2020).
- wasm-mandelbrot example on how to use EmScripten, wasm-toolchain, clang + BinaryEm, etc.
- WebAssembly examples by Mozilla
- Tutorial on Rust to WASM by Mozilla