# Retargeting a C Compiler to 6502 I've written before about [cc65](cc65-optimization.md.html) and some of its optimization challenges. It's a stable compiler with a robust toolchain, but its code generator doesn't full advantage of the 6502 when performing 8-bit operations. It places function parameters and many local variables on a separate stack, which requires many calls to helper functions. I've done a [survey of 6502 high-level languages](higher-level-6502-programming.md.html), and while many are intriguing, none have the stability and C compatiblity of cc65. So what's so hard about the 6502? What's keeping us from just retargeting a modern C compiler, bringing its powerful optimization routines to 6502? Well, it's complicated. Like, really complicated. ## LLVM LLVM is the dominant compiler backend library, and there's [documentation](https://llvm.org/docs/WritingAnLLVMBackend.html#register-set-and-register-classes) on how to write a LLVM backend. However, many of the instructions read like "draw two circles, then draw the rest of the damn owl". LLVM is a freakin' huge library, and backends tend to be [complex](https://jonathan2251.github.io/lbd/). It wasn't designed for 8-bit systems like the 6502 where stack access is cumbersome and registers are scarce. ## SDCC But what if there were something already in the 8bitworkshop pipeline, right under our nose? Yes, it's SDCC, the Small Device C Compiler, the same one we use for Z80! What if we could retarget it to the 6502? The benefit of SDCC is that it already supports a handful of microcontrollers more constrained than the Z80. The Motorola/Freescale [HC08](http://hc08web.de/usb08/files/cpu08r2.pdf) is similar to the 6502 in some ways. It has an 8-bit accumulator and two 8-bit index registers. It also differentiates between direct (0-255, or zero page) and extended (0-65535, or absolute) addressing modes. Might this be a good start for our own 6502 backend? Let's find out. ## The Test Program We'll use this simple C program to compare compilers: ```c #include #include bool get_pixel(uint_fast8_t x, uint_fast8_t y); void set_pixel(uint_fast8_t x, uint_fast8_t y); void fill_line_left(uint_fast8_t x, const uint_fast8_t y) { for(;; x--) { if(get_pixel(x, y)) return; set_pixel(x, y); } } ``` Caveat: This test program is from a [research paper](https://link.springer.com/content/pdf/10.1007%2F978-3-642-37051-9_1.pdf) about register allocation written by SDCC's maintainer, so we might be cherry-picking a bit. ## The CC65 Version Let's see how CC65 compiles this: ```armasm .proc _fill_line_left: near jsr pusha ldy #$01 L000E: lda (sp),y jsr pusha ldy #$01 lda (sp),y jsr _get_pixel tax bne L0003 ldy #$01 lda (sp),y jsr pusha ldy #$01 lda (sp),y jsr _set_pixel ldy #$01 lda (sp),y sec sbc #$01 sta (sp),y jmp L000E L0003: jmp incsp2 ``` CC65 stores its parameters on the stack, requiring an expensive indirect `(sp),y` instruction. Note the `jsr pusha` subroutine calls, which take about 40 cycles each, used to push parameters on the stack for `get_pixel` and `set_pixel` function calls. ## The HC08 Target HC08 is the backend on which we'll base the 6502 backend, so let's see how it does: ```armasm _fill_line_left: stx _fill_line_left_y_1_3 00104$: psha ;fill.c:9: if(get_pixel(x, y)) ldx _fill_line_left_y_1_3 jsr _get_pixel tax pula tstx beq 00102$ rts ;fill.c:10: return; 00102$: psha ;fill.c:11: set_pixel(x, y); ldx _fill_line_left_y_1_3 jsr _set_pixel pula ;fill.c:8: for(;; x--) { deca bra 00104$ rts ``` It's a lot smaller than the CC65 version, and has no indirect loads or parameter pushes. This backend passes two 8-bit parameters in the A and X registers -- additional parameters are generally passed in static locations. It also looks pretty similar to 6502 code, doesn't it? One might be tempted to convert this directly to 6502 without mucking about in the backend. ## The Peephole Assembly Code Translator SDCC has a peephole optimizer that we can configure to translate to 6502 code. For example, HC08's TST instruction can be converted to 6502's BIT instruction: ```armasm tst %0 = BIT %0 ``` Some HC08 instructions aren't directly mappable. For example, indirect load of X requires a zero-page pointer: ```armasm lda ,x = STX ptr STY ptr+1 LDY #0 LDA (ptr),Y PHP LDY ptr+1 PLP ``` Going down the rabbit hole, we'll discover that this approach isn't workable. HC08 isn't exactly like 6502: * Not little-endian * 16-bit index register (HX) vs two 8-bit index registers (X and Y) * Different flag behavior * Additional instructions and addressing modes So now we'll go mucking around in the compiler source. Hacking the HC08 backend to produce 6502 code is easy, right? * Figure out what it is doing * Make it do the other thing ## Really Doing a Compiler Backend Turns out there is a lot involved in "making it do the other thing", e.g. emitting working 6502 code: * Change big-endian to little-endian. * Modify the register allocator to allocate both X and Y properly. * Make sure to pop/push/transfer registers without clobbering other registers. * Various optimizations: * Use the absolute X/Y indexing modes when possible. * Prefer 16-bit values and pointers to be in zero-page addresses. * Implement bit-shifting logic when multiplying/dividing by powers of two. * Create assembler and linker tools. * Port the standard SDCC library to 6502. * Create a 6502 simulator and test suite. * Fix the broken tests. After all that and more, we finally get output from the experimental SDCC 6502 backend: ```armasm _fill_line_left: stx _fill_line_left_y_65536_3 pha ldx _fill_line_left_y_65536_3 jsr _get_pixel tax pla cpx #0x00 beq 00102$ rts 00102$: pha ldx _fill_line_left_y_65536_3 jsr _set_pixel pla sec sbc #0x01 jmp 00104$ rts ``` ## Yikes What kind of obstacles did we run into? * The code base is 20 years old, and the HC08 backend was hacked up by someone other than the author. The maintainer later came back and shoehorned a new, fancy register allocator in C++. The whole thing is hard for mere mortals to understand. * There are lots of special cases to fix -- multibyte operations especially. * There's a test suite, we had to fix many of the examples, as well as bring up a 6502 command-line simulator. * Registers can be free, in use, or dead (free after the current operation). * Pointers required rearranging much of the code, since the HC08 uses a register for indirect access, and the 6502 uses zero-page. * Had to transfer the S -> X register for stack access. * Had to make special case for absolute indexing (`aaaa,x` or `aaaa,y`) * Some instructions have addressing limitations, like `BIT` and shift instructions. * `PLA` modifies the flags in 6502 land, but not the HC08. * Register allocation is a big complex thing that I'm afraid to touch, and when I do ASSERTs are thrown. It actually does a "dry run" compiling multiple times to figure out the optimal register allocation. * Some registers had to be "pushed" onto a temporary zero-page stack, since vanilla 6502 can't push/pop X or Y. * Needed to keep a pointer to the stack frame in zero page when there's stuff on the stack. * Memory allocation (calloc of asmops, random freeing of objs) * Regression tests only expose one bug on average, so it goes very slow. There's one test case per 14,000 lines of code. Had to use my own 6502 simulator to find some bugs. * Longlong routines yuk (found bug in longlong shift functions) * AST operations take lots of arguments (e.g. a == b << c) so tends to be a lot of logic to test. * Endless bugs from transferring registers in the wrong order Most of the regression tests pass, but even after heroic efforts fixing things, many of them still fail. ("Debugging something into existence" is a phrase that keeps popping into my head.) If I started from scratch, I'd think about writing a 6502 code generation layer before touching the SDCC backend, that would: * Track register liveness * Track bitmasks/ranges of values * Track equivalences (register == constant, etc) * Choose stack vs. memory for temporaries * Handle multiple register transfers (e.g. X -> A -> Y) * Convert pointers to absolute indexing * Handle 6502 vs 65C02 instructions * Convert long branches to JMP ## Try it Out You can download the experimental fork of SDCC with 6502 support here: https://github.com/sehugg/sdcc-m6502 Look at the [manual](http://sdcc.sourceforge.net/doc/sdccman.pdf) (PDF) for build instructions.