Fuzzing the Z-Machine#

Playing text adventure games is sort of fun, but takes a lot of brain power. Nowadays we have all this spare CPU capacity going to waste.

What if we let the computer solve the game, and we just sit back and watch? We don’t even need any fancy neural network, we just need good old brute force.

We’re going to throw tons of semi-random input at a text adventure and see what happens. In the security world, this is called fuzzing.

We’re going to target the Z-Machine, a virtual machine interpreter that was developed by Joel Berez and Marc Blank in 1979, and powered Infocom’s titles. It’s the perfect target for our fuzzing adventures, since it’s well-documented and there are plenty of tools and libraries available.

Zork I running on an Atari 800XL (Sebastian Grünwald, CC 3.0)

Mini-Zork#

The game we’re fuzzing is titled MINI-ZORK I: The Great Underground Empire. This was an demo version of Infocom’s Zork I intended to run from cassette rather than floppy disk. It was essentially an advertisement, given away in an issue of the British Commodore users’ magazine “Zzap! 64” in 1990.

For those who haven’t played Zork, here’s the first thing you see when you boot the game:

MINI-ZORK I: The Great Underground Empire
Copyright (c) 1988 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Release 34 / Serial number 871124

West of House
You are standing in an open field west of a white house, with a boarded front door. You could circle the house to the north or south.

There is a small mailbox here.

>

The > prompt invites users to type commands such as OPEN MAILBOX or GO NORTH to advance the game. The goal is to “find the treasures of the Great Underground Empire and put them in your trophy case” solving puzzles and defeating enemies along the way.

Let’s Play Hunt the Verb (and the Noun)#

The user manual included with Zork listed examples of potential commands, like OPEN THE WOODEN DOOR and WARLOCK, TAKE THE SPELL SCROLL THEN FOLLOW ME. However, users were largely on their own in figuring out how to solve a particular puzzle.

Verbs like GET and DROP were obvious, as well as the standard eight compass directions and UP and DOWN (as well as IN and OUT.) But users would also have to ATTACK, INFLATE, and PRAY, as well as speak a few magic words that weren’t in the user manual. When the game didn’t give enough hints to the player, they derisively called this process “hunt the verb”.

To generate commands, the fuzzer needs to know the list of words accepted by the game, called its vocabulary. The Z-Machine defines this list as the game dictionary, and it’s at a standard location in every game file.

(This is kind of cheating, yes! But there’s really no other way to tell the computer what words to use, since some verbs aren’t listed anywhere in the text.)

The simplest way to generate commands is to take one or more words at random – one or two words, currently. We don’t know which words are verbs and which are nouns, so we end up generating lots of commands like SEE OOPS and DRIVER BELOW.

This is obviously pretty inefficient, since we have to go through N * N combinations (where N is the vocabulary size) to find that special command like KILL TROLL.

However, we can stack the deck a bit. We scan all the words output by the game, and pick the ones that match our dictionary. We occasionally pick a word from that list instead of the entire dictionary. For example, in the opening text we’d see NORTH, WEST, HOUSE, and MAILBOX and be more likely to generate those words.

The Search for Novel Tokens#

By just issuing random commands, we get a lot of nonsense, which the parser usually rejects:

>about painti
[There was no verb in that sentence!]

>leathe guideb
[You used the word "leathe" in a way that I don't understand.]

(Dictionary words are no more than six characters in the Z-Machine, which is why we generate words like “leathe”.)

Flailing around like this will take an eternity, though. How do we know which paths are more promising than others? We look for novel tokens.

The Z-Machine has a PRINT instruction, which outputs text to the console. These are often fragments like “West of House” and “The bottle shatters.” We record each of these as a token.

Whenever we see a new token, we save the current playthrough – the list of commands we executed in the current game. We associate this list with the current token, so that we can (hopefully) regenerate the text by replaying the commands.

Each run of the game chooses a specific goal token, and thus an associated playthrough. The search algorithm chooses newer tokens more often than older tokens.

We don’t replay the commands exactly during each game, but we’ll throw in some random commands too, and shuffle the order a bit. When we see the goal token, we’ll increment a “success” variable, which makes us vary the command list less often. When this variable gets high enough, we mark this token “stable”, since we have a predictable playthrough that generates it.

Finding a Shorter Path#

The paths we take through the game aren’t often efficient. Here’s one list of commands we used to generate the token “Wheeeeeeeeee!!!!!”:

curse, art, body gate, incant count, the, the egg, repent, from the, the consum, what, leathe, trap- see, breath here, what intnum, about here, leathe guideb, about, about here, pot, here, see, here about, about, self, here about, mangle, see, rug, the, reply, elvish, say, stilet beetle, say toss, pray, gate about, what bolt, guideb, wooden, say knock, say sit, trail and, here, pray leathe, intnum, one, pray one, jump

All we really had to do is type the last command, JUMP (or DIVE). But the search algorithm doesn’t know which of the preceding commands are required to output “Wheeeeeeeeee!!!!!”

So we reduce the playthroughs – make them as short as possible. Whenever we see a token, we’ll replace the associated playthrough with a shorter list of commands, if possible. This gets us to the goal token faster, giving us more turns to experiment after reaching the goal.

A lot of tokens like “Wheeeeeeeeee!!!!!” aren’t interesting, in that we can generate them in a single turn at the start of the game. By reducing their command lists, we can eventually prove this to be true, and thus remove them from the list of potential goal tokens.

More Than Words#

Because we have access to the Z-Machine internal state, we can use things other than the textual output to guide our search. For example, we can record when an object moves from room to room, or when an object’s property flag changes. We call these VM tokens, and we record them along with textual tokens:

@mv_30_15      Player (#30) goes east to room #15
@f_176_10_1    Player sets open flag (10) on window (#176)

We need these because the textual output doesn’t tell us the whole story. For example, picking up a sword and a lamp both generate the token “Taken.” The VM token tells the search algorithm when a novel VM state is reached, for example when the player moves to a new room, or an object is picked up or discarded.

Hacking the VM#

Exploring the game state is a pretty slow process. One of the first tasks in the game is to kill a troll who blocks your path. However, the player has to have previously picked up a sword in the house above.

To short-circuit the search process, we randomly hack the Z-Machine, changing game state in ways we’ve seen previously. For example, we might randomly move the sword to the player’s hand, allowing the STAB command to succeed. (ATTACK TROLL doesn’t work unless we add WITH SWORD, but STAB implies a sharp object and succeeds.)

We’ll only hack with stable tokens, so that if we can reliably repeat the player picking up the sword, we’ll allow hacking of the sword into the player’s hands. Then we’ll merge the commands used to pick up the sword into the commands used to descend into the dungeon, hopefully figuring out that we need to attack the troll.

The troll example is particularly diabolical, because it usually takes multiple blows to finish off the troll, each attack generating a random result. Since our algorithm prefers shorter playthroughs, it’s more likely to be optimistic about our fighting abilities.

After 530,000 playthroughs and 10,600,000 commands (200 commands per game) we finally figure out how to attack the troll:

north, east, open window, into, west, light, lift trap, small hi, get, west, light, tug large, lift trap, down, north, stab

We’ve got a few superflous commands, and we haven’t yet figured out that we have to hit him multiple times, but we’ll get there.

A Fatal Fascination#

The search algorithm doesn’t know the difference between picking up objects, dropping objects, and moving the player from room to room. The only way it determines forward progress is by seeing novel tokens.

This leads the search algorithm to quickly develop a taste for … murder! Killing the player, specifically, because it’s so easy – you just type ATTACK:

>attack
[brave adventurer]
Poof, you're dead!

    ****  You have died  **** 

Well, you probably deserve another chance. I can't quite fix you up completely, but you can't have everything.

Forest Edge
Paths lead into the forest to the west and northwest. Also, a well-marked path extends east.

In Mini-Zork, the first death doesn’t end the game, but the player is teleported to another location, and your possessions are scattered. To the search algorithm, death is just an object moving from one room to another, generating novel tokens along the way. This fascination leads to the exposure of other funny bugs in the game, like the player being able to throw their own hands into a river.

The game keeps a score of 0 to 350 points, based on solving puzzles and collecting treasure. When the player dies, the score goes down by 10 points. We can use score as a heuristic, but this might overly dissuade risky behavior – like going into a dark location, or fighting trolls.

The search algorithm also gets interested in stuff that the player doesn’t see, like NPCs moving between rooms. For example, the token @mv_112_37 represents the thief moving to a specific room. The search algorithm manages to reproduce this token by issuing the Z or WAIT command repeatedly, waiting for the thief to reach the target room.

It also loves to pick up and drop objects in different locations, because each object movement is a new token. Who knows? Maybe dropping that leaflet in the Forest Path will win the game. (Narrator: It won’t.)

Fuzzing invaribly exposes bugs in a program, and this game is no different, although it’s fairly resiliant. It figured out how to generate the garabage word “Clrthatrqdc” at the start of the game:

>tie up
[brave adventurer]
With a   Clrthatrqdc!?!

This is probably an uninitialized variable pointing to non-textual data. The Z-Machine’s compressed text encoding favors alphabetic letters, which is why you don’t see as much random garbage as when trying to print a binary file as ASCII. (This word only returns two hits on Google as I’m writing this.)

Solving the Game#

To win the game, we have to haul our loot back to the trophy case and place each item inside. It’ll take a long time for our simple search algorithm to stumble upon this behavior, especially given its tendency to waste time moving objects from room to room.

Making the algorithm more complicated takes time away from the random exploration phase, so we have to be judicious when adding features. And we want to avoid a priori knowledge of the game – in other words, we only want to cheat a little bit.

If you want to experiment, check out the source code on GitHub, which uses Daniel Lehenbauer’s JSZM interpreter. Plenty of games are available (only up to Version 3 supported.)

Also see The Z-Machine Standards Document by Graham Nelson, who’s been working with the Z-Machine for a couple of decades.

And should 8bitworkshop support the Z-Machine one day? Let me know!