The Three-Stream Versatile Kernel#

In my previous article on the Versatile Playfield Kernel, we explored how to draw complex backgrounds by choosing which register to update on each scanline. But this version didn’t address sprites in particular.

Becuase the playfield logic only took up to nine cycles per scanline, there was plenty of time left over to draw sprites using one of the typical routines. (You know the one – subtract position and height, test for boundaries, load bitmap and color, store in registers. Boring.)

Now we’re taking this concept further: what if we could control not just the playfield, but also two sprites using the same versatile approach?

The Problem: Managing Multiple Data Streams#

The playfield is a static background, but sprites move up and down and across the screen. We’ll need to:

Update sprite bitmaps (GRP0, GRP1) as they move vertically
Update sprite colors (COLUP0, COLUP1)
Reposition sprites horizontally using HMOVE (if more than two)
Maintain our playfield updates

All within our 76-cycle scanline budget! Sounds impossible, right?

The Solution: Display Lists#

Instead of hardcoding what each data stream does, we use a display list system. Think of it as a playlist for your register writes – each entry tells the kernel where to start reading data, and when the data runs out it moves on to the next track.

The kernel operates with three parallel streams:

Stream #1 - Controlled by dlist1 (sprite 0, or HMOVE sequences)
Stream #2 - Controlled by dlist2 (sprite 1, or HMOVE sequences)
Stream #3 - Fixed playfield data (PF registers and colors)

Each stream follows the same (register, value) pair pattern from the original versatile kernel. But stream #1 and #2 are indirectly indexed via display lists.

Display List Format#

A display list entry is simply an index into the data tables. For each stream, the kernel reads sequentially from the data tables, one byte per scanline. When it encounters a zero, it consults the next entry in the display list and resets the stream pointer to the next display list entry. For example:

; Example display list
dlist1:
    .byte DL1_SPRITE      ; Draw sprite
    .byte DL_HMOVE_R8     ; Move right 8 pixels
    .byte DL_HMOVE_0      ; Stop movement
    .byte DL1_SPRITE      ; Draw sprite again
    .byte DL_HMOVE_L7     ; Move left 7 pixels
    ...

The display list indices point to pre-defined sequences in the data tables. For example:

DL1_SPRITE points to 8 bytes of GRP0 bitmap data
DL_HMOVE_R8 points to a sequence that sets HMP0/HMP1/HMM0/HMM1/HMBL to $80 (move right)
DL1_BGROUND points to 6 bytes of COLUBK color gradient data

Sprite Repositioning via HMOVE#

We can change the horizontal position of a sprite in the middle of the screen, but it’s not easy. Fine horizontal positioning on the 2600 requires a specific timing dance:

Set HMPx register to coarse movement value (+8 pixels)
Wait as many scanline as needed
Set HMPx register to fine adjustment (+1 or -1 pixels)
Wait for the next scanline
Set HMPx register to 0

By encoding these register writes in the display list, we can command horizontal motion for either player or missile object:

; HMOVE sequence for moving right 8 pixels
DL_HMOVE_R8:
    ; Set coarse movement
    .byte HMP0,$80, HMP1,$80, HMM0,$80, HMM1,$80, HMBL,$80
    
DL_HMOVE_R1:
    ; Fine adjustment
    .byte HMP0,$F0, HMP1,$F0, HMM0,$F0, HMM1,$F0, HMBL,$F0
    
DL_HMOVE_0:
    ; Stop movement
    .byte HMP0,$00, HMP1,$00, HMM0,$00, HMM1,$00, HMBL,$00

The display list sequences these register writes across multiple scanlines, so we don’t have to do a horizontal repositioning operation (using RESPx) which would stall the other streams.

The drawback is that orchestrating these register writes precisely to end up at a given position is non-trivial, and if the positions are far apart it takes many scanlines to get there.

The Kernel Loop#

The core kernel loop is beautifully simple (well, for Atari 2600 code) despite managing three streams:

loop:
    ; Stream #3 (playfield - always active)
    ldy idx3
    ldx tabloc3,y
    inc idx3
    lda tabval3,y
    sta $00,x
    
    ; Stream #1 (check for display list reset)
    ldy idx1
    ldx tabloc1,y
    bne noredo1
        ; Hit zero - get next index from dlist1
        ldx dlidx1
        lda dlist1,x
        sta idx1
        inc dlidx1
        ; ...handle stream #2...
    noredo1:
    inc idx1
    lda tabval1,y
    sta $00,x
    
    ; Stream #2 (similar logic)
    ; ...
    
    sta HMOVE    ; Trigger any horizontal movement
    dec nlines
    bne loop

Each scanline updates up to three TIA registers - one from each stream. When a stream’s register index hits zero, the kernel consults that stream’s display list to jump to the next sequence. When moving to a new display list, we don’t write any TIA register for that stream on that scanline.

Delay Padding#

You might notice the display lists include delay entries (DL_DELAY). These are filled with NOOP register writes ($7F) that the kernel executes but which don’t affect the display. This lets us control timing - for example, holding a sprite graphic for several scanlines or spacing out HMOVE sequences.

The delay padding uses 31 NOOP entries:

; Delay section (31 scanlines of nothing)
repeat 31
    .byte NOOP    ; register $7F (doesn't exist)
    .byte 0       ; value (ignored)
repend

By pointing your display list somewhere inside of the NOOP entries, ou can implement anywhere from 1 to 31 lines of delay. (Nothing special about the number 31, but you use fewer display list entries in RAM for larger delay sections.)

What Can You Do With This?#

The three-stream kernel enables some interesting behavior:

Multiplexed sprites: Draw the same sprite multiple times at different positions by alternating between sprite graphics and HMOVE sequences, without affecting the rest of the display.
HMOVE effects: You can move sprites horizontally by single pixels while they’re being drawn to render lines and slanted shapes.
NUSIZ effects: In the screenshot, we tweak NUSIZ halfway through so that superdude’s head is 2x width and the body is 3x width. We use HMOVE/HMP0 to align the two sections properly.
Stacked sprites: You can chain display lists to draw multiple components of a sprite vertically. For example, you could have different shapes for head, torso, and legs.
Crossing the streams: Write registers of the playfield or those of a different sprite from another display list for weird effects.
Write to anything: Unleash chaos on your program by writing to random RAM locations, or poke around the TIA registers. We won’t tell anybody.

Limitations#

Of course, nothing is free on the 2600. You still only update 3 registers per scanline (one per stream) so for sprites you can change the bitmap or the color, but not both on the same scanline. Our previous kernel let you change 5 register per scanline. However, the playfield logic runs on each scanline (both register and data lookup) so you get single-line resolution for backgrounds.

Most kernels arrange thing so that they only write registers in the hidden part of the scanline. We don’t have spare cycles to do that here. The implementor has to decide between sprites that “tear” as they move horizontally, or a crooked background.

Also, display lists consume precious RAM - 32 bytes per list in this implementation. Not to mention the cycles required to set up the display lists every frame.

Creating Data Tables and Display Lists#

How do you design these data tables? You only have 256 bytes per table, so you have to carefully spend your budget, balancing bitmap and color data with HMOVE and other utility tables. This is another case where hand-coding works for simple demos but a custom tool or parser becomes essential for complex games. You could also use bank switching to access multiple sprite sheets.

The real trick is building these two display lists each and every frame within the cycle budget. It’d be like programming the Atari 7800 hardware from inside of a Rubik’s Cube. We’ve left that part as an exercise for the reader.

This scheme is somewhat baroque – or perhaps “harebrained” – but it demonstrates that the boundaries of the original 1977 design are still undiscovered.

The Versatile Playfield Kernel RISC-V in Your Browser with Verilog

13 October 2025

Recent Posts

Archives