Emulator Performance: WebAssembly vs. JavaScript#

The 8bitworkshop IDE integrates several different kinds of emulators that run in the web browser, each kind with different characteristics. Ideally, our emulators would be:

  • Performant – They should load quickly, and run at native speeds in modern browsers.

  • Debuggable – Users should be able to single-step, set breakpoints, rewind/replay state, inspect memory, etc.

  • Tracable – 8bitworkshop can display cycle-level maps of the emulator’s execution state, which requires writing a “probe log” during execution. This requires hooking to the CPU’s memory bus and instruction execution.

  • Accurate – We’d like our emulators to closely emulate the real hardware, though we don’t demand 100% accuracy in all cases.

Let’s go over the various flavors of emulator in the IDE, and then test their performance in the browser.

JavaScript Emulators#

Platform API#

8bitworkshop can use existing open-source emulators written in JavaScript – examples are Javatari.js, JSNES, and libv86.js. They usually handle graphics, sound, and control input on their own.

To integrate with the IDE, they get wrapped around 8bitworkshop’s Platform API, which gives them simple API functions like start() and pause(). Practically, they have to be modified or runtime-patched to enable debugging and tracing abilities.

Machine API#

When writing a JavaScript emulator from scratch, we prefer to use the Machine API. This gives us a “headless” API, which separates the machine logic from the user interface. It also makes it easier to add new machines that automatically have debugging and tracing support. We just have to:

  • Set the clock frequency and clocks per scanline/per frame.

  • Set the video/audio parameters.

  • Implement the CPU (or reuse the 6502/Z80/etc emulators)

  • Implement the memory map (read/write hooks)

  • Draw scanlines into a pixel buffer.

  • Render audio into a sample buffer.

  • Implement the keyboard/control mapping.

Verilog#

The Verilog platform is the odd one out, since it has no fixed CPU as such. Verilog modules are translated into JavaScript, then simulated clock-by-clock. They are usually much slower than the equivalent hand-coded CPU emulator.

We’ll get more into Verilog later in this post.

WebAssembly Emulators#

MAME#

Some emulators (e.g. Atari 800) use MAME, compiled to WebAssembly via Emscripten. The disadvantages are that debugging support is pretty spotty (we don’t take full advantage of MAME’s native debugger) and it’s slow to start and reload. But the effort required is minimal:

  • Compile MAME for a given machine (our patched version, that is)

  • Subclass BaseMAMEPlatform and provide configuration

  • Implement the ROM-loading logic

MAME’s performance lags the lightweight JS and WASM emulators, but for some machines it’s the only option available. It would be nice to have full debugging and tracing support, though this may require further patches to the MAME source code.

Native WebAssembly#

The C-64 and ZX Spectrum emulators use Andre Weissflog’s chips C emulator library. These are “headless” emulators that don’t depend on any specific user interface library. Each emulator is wrapped in a C version of the Machine API, then compiled to WebAssembly. They have full debugging and tracing abilities, just like the native JavaScript emulators.

Performance#

Here we’ll compare the performance of the various emulator flavors. I tested with Firefox and Chrome’s dev tools on an old MacBook Pro, recording the approximate percentage of CPU used in the main loop.

platform

tech

browser

cpu %

notes

zx

wasm

firefox

5%

zx

wasm

chrome

8%

The native-WASM ZX Spectrum emulator was the most efficient in both browsers. This may be explained by its relative simplicity.

platform

tech

browser

cpu %

notes

coleco

js

firefox

14%

c64

js

chrome

18%

c64

js

firefox

20%

nes

js

firefox

25%

atari7800

js

firefox

25%

The runners up were a handful of native-JavaScript emulators, and the open-source JSNES emulator. The native-JS C64 emulator here is not fully implemented, which may explain its performance.

platform

tech

browser

cpu %

notes

c64

wasm

firefox

34%

vcs

js

firefox

36%

c64

wasm

chrome

36%

test pattern

js verilog

firefox

37%

Next in line are the C64 WASM emulators, which tie with the Javatari emulator. These are fairly complex cycle-level emulators, about the same level of complexity, which may explain similar performance.

One surprising result is that the C64 (WASM) and Atari 2600 (JS) emulators have similar performance. They have the same CPU speed and similar emulation complexity, but you’d expect WASM to be multiples faster than JS.

platform

tech

browser

cpu %

notes

nes

mame

firefox

48%

char display

js verilog

firefox

50%

vcs

mame

firefox

56%

racing game discrete

js verilog

firefox

68%

coleco

mame

firefox

70%

racing game w/ cpu

js verilog

firefox

72%

atari5200

mame

firefox

83%

The MAME-WASM emulators didn’t perform better than any of the other JS or WASM emulators, and are on par with the simpler Verilog simulations.

platform

tech

browser

cpu %

notes

16-bit game

js verilog

chrome

125%

48 fps

brick game

js verilog

firefox

128%

47 fps

16-bit game

js verilog

firefox

143%

42 fps

mango one

js verilog

firefox

200%

30 fps

Last place went to the complex Verilog simulators, which couldn’t even sustain 60 FPS on my machine.

Verilog WebAssembly Compiler#

I’ve been working on an experimental Verilog runtime that emits WebAssembly, generated from the AST nodes of the Verilog compiler.

Here are the test results with the new Verilog engine (again, on an old Macbook Pro):

  • All browsers performed equally well on WebAssembly.

  • Safari’s JavaScript performance was fastest, followed by Chrome, then Firefox.

  • The best WebAssembly benchmark was only twice as fast as the slowest JavaScript.

If we had the time to mantain two implementations of the Verilog runtime, we could keep both. They can be tested against each other, and we could switch between them depending on browser and the workload.

But apps may require architectual changes to support both engines. For example, WebAssembly <-> JavaScript interactions take a large performance penalty. Because of this penalty, 8bitworkshop has to buffer video data for each scanline, rather than call into the emulator once per cycle. Also, marshalling data between the JS and WASM worlds takes effort.

So it’s still not clear! Browsers still have a lot of WebAssembly features in the roadmap, so we still might look forward to improvements. But other transpiling projects like TeaVM still stick with JavaScript.

It would be great if Verilog simulations ran about 10 times faster – then we could replace all of the emulators with Verilog modules! One possibility is to write a specialized HDL (High-level Design Language) that compiles to efficient JavaScript or WASM code, as well as to Verilog.

Conclusions#

JavaScript emulators have some usability advantages over WASM. They are easy to debug – exceptions always throw full stack traces, and symbols are easily accessible.

It’s also trivial to implement just-in-time optimizations in JavaScript – for example, the ARM emulator in this Gameboy Advance emulator caches a new JS function for each decoded instruction, and can run at 8 MHz and beyond in a browser.

The downside is that JS optimization is opaque and browser-dependent, and may even change while the code is running – “deoptimization” happens when assumptions made by the compiler don’t hold up over time.

WebAssembly performance is more stable, although it also varies between browers, and even slows own whenever the Dev Toolbar is showing.

JavaScript can only efficiently emulate 32-bit hardware, though. WASM emulators can take advantage of 64-bit and 128-bit operations. They can also (eventually) use shared memory to speed up multi-CPU emulators and audio engines, perhaps better than JS worker threads.

WebAssembly is very useful for running C/C++ apps in the browser – 8bitworkshop takes advantage of this for most of its compilers.

In conclusion, JavaScript and WebAssembly performance is a land of contrast. Either technology is appropriate for running an emulator in a web browser, and there are pros and cons to each.

UPDATE#

An earlier post described performance problems on Firefox and Safari, which were related to the Proxy object. This feature is used to read values from WebAssembly into JavaScript.

After changing to Object.defineProperty() all browsers are equally fast on WebAssembly, beating JavaScript in all cases.


Note: The illustration was generated by an AI notebook from @advadnoun