The Three-Stream Versatile Kernel#
In my previous article on the Versatile Playfield Kernel, we explored how to draw complex backgrounds by choosing which register to update on each scanline. But this version didn’t address sprites in particular.
Becuase the playfield logic only took up to nine cycles per scanline, there was plenty of time left over to draw sprites using one of the typical routines. (You know the one – subtract position and height, test for boundaries, load bitmap and color, store in registers. Boring.)
Now we’re taking this concept further: what if we could control not just the playfield, but also two sprites using the same versatile approach?
The Problem: Managing Multiple Data Streams#
The playfield is a static background, but sprites move up and down and across the screen. We’ll need to:
Update sprite bitmaps (GRP0, GRP1) as they move vertically
Update sprite colors (COLUP0, COLUP1)
Reposition sprites horizontally using HMOVE (if more than two)
Maintain our playfield updates
All within our 76-cycle scanline budget! Sounds impossible, right?
The Solution: Display Lists#
Instead of hardcoding what each data stream does, we use a display list system. Think of it as a playlist for your register writes – each entry tells the kernel where to start reading data, and when the data runs out it moves on to the next track.
The kernel operates with three parallel streams:
Stream #1 - Controlled by
dlist1
(sprite 0, backgrounds, or HMOVE sequences)Stream #2 - Controlled by
dlist2
(sprite 1, playfield colors, or HMOVE sequences)Stream #3 - Fixed playfield data (PF registers and colors)
Each stream follows the same (register, value) pair pattern from the original versatile kernel. But stream #1 and #2 are indirectly indexed via display lists.
Display List Format#
A display list entry is simply an index into the data tables. For each stream, the kernel reads sequentially from the data tables, one byte per scanline. When it encounters a zero, it consults the next entry in the display list and resets the stream pointer to the next display list entry. For example:
; Example display list
dlist1:
.byte DL1_SPRITE ; Draw sprite
.byte DL_HMOVE_R8 ; Move right 8 pixels
.byte DL_HMOVE_0 ; Stop movement
.byte DL1_SPRITE ; Draw sprite again
.byte DL_HMOVE_L7 ; Move left 7 pixels
...
The display list indices point to pre-defined sequences in the data tables. For example:
DL1_SPRITE
points to 8 bytes of GRP0 bitmap dataDL_HMOVE_R8
points to a sequence that sets HMP0/HMP1/HMM0/HMM1/HMBL to $80 (move right)DL1_BGROUND
points to 6 bytes of COLUBK color gradient data
Sprite Repositioning via HMOVE#
We can change the horizontal position of a sprite in the middle of the screen, but it’s not easy. Fine horizontal positioning on the 2600 requires a specific timing dance:
Set HMOVE registers to coarse movement value (+8 pixels)
Wait for the next scanline
Set HMOVE registers to fine adjustment (+1 or -1 pixels)
Wait for the next scanline
Set HMOVE registers to 0
By encoding this as data sequences, we can move sprites smoothly while the display list automatically sequences through the timing:
; HMOVE sequence for moving right 8 pixels
DL_HMOVE_R8:
; Set coarse movement
.byte HMP0,$80, HMP1,$80, HMM0,$80, HMM1,$80, HMBL,$80
DL_HMOVE_R1:
; Fine adjustment
.byte HMP0,$F0, HMP1,$F0, HMM0,$F0, HMM1,$F0, HMBL,$F0
DL_HMOVE_0:
; Stop movement
.byte HMP0,$00, HMP1,$00, HMM0,$00, HMM1,$00, HMBL,$00
The display list sequences these register writes across multiple scanlines, so we don’t have to do a horizontal repositioning operation which would stall the other streams.
The drawback is that orchestrating these register writes precisely to end up at a given position is non-trivial, and if the positions are far apart it takes many scanlines to get there.
The Kernel Loop#
The core kernel loop is beautifully simple (well, for Atari 2600 code) despite managing three streams:
loop:
; Stream #3 (playfield - always active)
ldy idx3
ldx tabloc3,y
inc idx3
lda tabval3,y
sta $00,x
; Stream #1 (check for display list reset)
ldy idx1
ldx tabloc1,y
bne noredo1
; Hit zero - get next index from dlist1
ldx dlidx1
lda dlist1,x
sta idx1
inc dlidx1
; ...handle stream #2...
noredo1:
inc idx1
lda tabval1,y
sta $00,x
; Stream #2 (similar logic)
; ...
sta HMOVE ; Trigger any horizontal movement
dec nlines
bne loop
Each scanline updates up to three TIA registers - one from each stream. When a stream’s register index hits zero, the kernel consults that stream’s display list to jump to the next sequence. When moving to a new display list, we don’t write any TIA register for that stream on that scanline.
Delay Padding#
You might notice the display lists include delay entries (DL_DELAY
). These are filled with NOOP register writes ($7F) that the kernel executes but which don’t affect the display. This lets us control timing - for example, holding a sprite graphic for several scanlines or spacing out HMOVE sequences.
The delay padding uses 31 NOOP entries:
; Delay section (31 scanlines of nothing)
repeat 31
.byte NOOP ; register $7F (doesn't exist)
.byte 0 ; value (ignored)
repend
By pointing your display list somewhere inside of the NOOP entries, ou can implement anywhere from 1 to 31 lines of delay. (Nothing special about the number 31, but you use fewer display list entries in RAM for larger delay sections.)
What Can You Do With This?#
The three-stream kernel enables some interesting behavior:
Multiplexed sprites: Draw the same sprite multiple times at different positions by alternating between sprite graphics and HMOVE sequences, without affecting the rest of the display.
Complex sprites: You can chain display lists to draw multiple components of a sprite vertically, or even modify NUSIZ to get different widths or repeated sprites. For example, you could have different shapes for head, torso, and legs.
HMOVE effects: You can move sprites horizontally by single pixels while they’re being drawn to render lines and slanted shapes.
Crossing the streams: Write registers of the playfield or those of a different sprite from another display list for weird effects.
Write to anything: Unleash chaos on your program by writing to random RAM locations, or poke around the TIA registers. We won’t tell anybody.
Limitations#
Of course, nothing is free on the 2600:
You still only update 3 registers per scanline (one per stream) so for sprites you can change the bitmap or the color, but not both on the same scanline.
Display lists consume precious RAM - 32 bytes per list in this implementation.
The kernel is cycle-sensitive and so tables must not cross page boundaries.
Register writes occur at different positions along the scanline, so sprites may “tear” as they move horizontally.
However, the playfield logic runs on each scanline (both register and data lookup) so you get single-line resolution for backgrounds.
Creating Data Tables and Display Lists#
How do you design these data tables? You only have 256 bytes per table, so you have to carefully spend your budget, balancing bitmap and color data with HMOVE and other utility tables. This is another case where hand-coding works for simple demos but a custom tool or parser becomes essential for complex games. You could also use bank switching to access multiple sprite sheets.
The real trick is building these two display lists each and every frame within the cycle budget. It’d be like programming the Atari 7800 hardware from inside of a Rubik’s Cube. We’ve left that part as an exercise for the reader – you may notice this article has no screenshot.
This scheme is somewhat baroque – or perhaps “harebrained” – but it demonstrates that the boundaries of the original 1977 design are still undiscovered.