Your 8-bit System is a Weird PDP-11#

../../_images/pdp11.jpg

The C programming language is the de facto portable, low-level language. It was written with the PDP-11 minicomputer in mind.

The PDP-11 informed the x86 instruction set, and so in a way, we’re still using them. Today’s computers are so complex that they have to bend over backwards to emulate the simple architecture for which C was designed. It leads some to ask if C is a good fit for modern systems.

Likewise, the PDP-11 doesn’t have a lot in common with 80s arcade games. Game programmers in the 8-bit era wrote code predominantly in assembly language. The C language was around, but mostly in UNIX systems. If anything, they would have been more likely to use a homespun language like Zgrass or even FORTH.

It’s worth asking: is C a good fit for 8-bit systems?

We’ll discuss writing software targeted to this system with the most popular and robust C compiler for 6502 systems: CC65.

To truly appreciate the challenges, let’s turn our attention to one of the most demanding 8-bit platforms: the Atari 2600/VCS.

It’s weird how little RAM it has#

../../_images/atari1.jpg

The Atari 2600 has only 128 bytes of RAM, embedded into the PIA chip. This meager memory creates many hurdles for C developers.

The C language assumes a stack for passing parameters and for local variables. The 6502 maintains a stack, but typical VCS programs use it sparingly to conserve precious RAM.

You can add extra RAM to Atari carts via mappers inside the cartridge, and several games of the era did this. But the read/write signal is not routed into the Atari 2600 cartridge, so often one address line is designated for this purpose – separating the RAM into “read” and “write” areas. So you’d have to use special functions or macros to read/write this RAM. Unfortunately, this prevents the use of extra RAM for normal C variables or the stack.

Bank-switching is weird#

../../_images/atari2.jpg

The PDP-11 could only address 64 KB, but had a memory management subsystem that supported segmentation and virtual addressing. It could even split code and data between different segments (x86 programmers may recall the CS and DS registers performing this function).

The C language assumes a flat address space, and requires extensions to understand anything else. For example, on x86, pointers may be extended with the “far” attribute to reference a particular segment.

The VCS has a cost-reduced version of the 6502 that can only address 8 KB, and only 4 KB of that is connected to the cartridge. To go beyond 4 KB of ROM, the cartridge must implement a bank-switching mapper.

Each bank-switching scheme comes with its own set of tradeoffs:

  • Do you switch the entire 4 KB bank? Then you have to duplicate shared functions between banks, which is tricky, especially in CC65.

  • Do you only switch 2 KB, leaving the other 2 KB for fixed ROM or RAM?

  • Do you need more granular switching, like four 1 KB banks?

CC65 has a wrapped-call directive which can be used to switch banks automatically when calling functions. This depends on the assembler .bank directive which tags symbols with an 8-bit bank index.

But it’s an incomplete solution – for example, it does not prevent you from accessing data from a bank that is not mapped into memory space. It does not optimize calls within the same bank. And it also does not let you move library functions to different banks without changing the source code.

NESFab also has support for bank-switching, but performs static analysis to figure out the best way to map code and data into banks. (But note that NESFab is a custom language, not a C compiler.)

Display kernels are weird#

../../_images/atari3.jpg

The Atari 2600 famously “races the beam,” meaning that the CPU sets up each scanline right before the CRT paints it. This requires some fancy programming.

The most critical routine in the VCS are display kernels which take up about 75% of the CPU’s time. These routines must fit each scanline’s computation into no more than 76 cycles, and sometimes exactly 76 cycles. For this reason, they are almost always written in assembly langauge.

But can we write a C compiler so smart that we don’t need assembly anymore?

Even if the compiler was infinitely good at optimization, we don’t write assembly just to make code faster or smaller.

We would have to somehow tell the compiler the exact constraints – which may be more complicated than just writing the assembly routine the way we want it!

Page boundaries are weird#

Some of the 6502’s instructions take longer to execute when a 256-byte page boundary is crossed. This is critical for things like graphics routines, which must execute with an exact number of cycles on every scanline.

Assemblers typically have a single tool to deal with this: the .align directive. This just forces the address of the next object to start at a given multiple of an integer – for example, a value of 256 forces it to start at the next page boundary. If a C compiler supports alignment, it usually works the same way.

../../_images/atari4.jpg

This is an incomplete solution, because it can leave large gaps in the ROM. A better method would be to use a bin-packing algorithm to fit code and data fragments into pages.

Array layout can be weird#

The 6502 really likes arrays of bytes less than 256 bytes long. It has special instructions for reading and writing these arrays, and it doesn’t take many more cycles than an absolute address.

So let’s say you want an array of 16-bit values, for example an array of pointers. You’d want to put the lower 8 bits in one array, and the upper 8 bits in another array.

Unfortunately, no C compilers help you do this. You’d have to manually define each array, extracting the lower and upper bytes manually. Or you could take the hit of indexing into a 16-bit array.

The problem is worse if you decide to use an array of structs, because the compiler must use expensive (probably 16-bit) multiplication to find the offset of the struct in an array.

Stack access is weird#

../../_images/atari5.jpg

The 6502 and Z80 lack neat instructions for accessing variables placed on the stack via an offset. This means that stack accesses are expensive, and should be avoided where possible.

CC65 chooses to maintain a dedicated sp stack pointer in zeropage memory, and use the (zp),y addressing mode to access the data stack. This “data” stack is in addition to the 6502’s native stack.

The VCS only has 128 bytes of “normal” RAM, so CC65 stacks can eat up a lot of memory. Let’s say the two stacks are 16 bytes each – this takes up 25% of your precious RAM. It’s also easy to overflow either stack and overwrite variables accidentally.

Other compilers, like SDCC-6502, put variables on the native stack. They can then transfer the S (stack) register to the X register, and access like this: lda $100,x. It may be faster, and also you don’t have to worry about overflowing two separate stacks.

Some compilers statically analyze the program, so it can allocate local variables and parameters in static memory, rather than the stack. (Function pointers and/or recursion can derail this entire process, however.)

Combining C and assembly is awkward#

../../_images/atari7.jpg

Let’s say you want to write a module which has a C portion and assembly portion, without using inline assembly. This means you need to create at least three files – a .c file, a .h header, and an .asm file and possibly an .inc file to interface with other assembly files. You also have to maintain symbols in each of these files to match so they’ll link together. It’s awkward, requires lots of duplication, and is not too much fun.

It’s tempting to take the easy way out: inline assembly, which lets you inject assembly code directly into C functions. But this is often awkward, requiring compiler-specific syntax and asm(...) directives scattered everywhere. And you may not be able to use all of the features of the assembler, such as macros, interrupts, or anything that doesn’t fit neatly into a C function context.

C compilers will usually ignore the contents of your inline assembler, passing it straight through to the assembler. Some (like CC65) will perform a little bit of error-checking and optimization – in some cases, this is unwelcome!

CC65 also lacks CA65’s .incbin directive that imports binary files, which makes it even more tempting to use assembly.

C alternatives are even weirder#

../../_images/atari6.jpg

There have been many attempts to create alternative languages for 8-bit systems that solve some of these problems, with varied success.

However, C is a popular language, and CC65 is a robust toolchain.

So we’re going to forge ahead with C, and see just how messy it gets.

We can then figure out our pain points, and whether we need a brand new language or just changes to the C toolchain.

In the next blog post, we’ll introduce VCSLib, a C programming library for the Atari 2600 with graphics, sound, music, and bank-switching support.