Previously:
Getting to grips with the 6502 CPU is hard, because it’s so alien to a modern programmer. There’s just so little there. All sorts of trade-offs you’d normally make about branch prediction and cache coherency simply don’t matter because there isn’t any branch prediction and there isn’t any cache.
There also isn’t any multiplication or division, which makes all kinds of things tricky.
See also this really good guide to the 6502 instruction set
The accumulator is a bit of a bottleneck, with zero page memory kind of taking the place of registers, so it’s worth considering which instructions can mutate memory directly:
That’s not a lot of options … the other logical operations store their result in the accumulator instead of in memory:
So for those you’re stuck doing something like LDA; EOR imm; STA.
One of the most frustrating “missing instructions” is BIT imm
… there’s
CMP imm
and AND imm
but no BIT imm
, only BIT zeropage
and BIT absolute
.
So if you want to non-destructively check for an arbitary bit being set in the accumulator, you can’t
just do BIT #$40
… instead you have to either store that $40 in a zero page
location or you could do AND #$40
but restore A from somewhere.
(This is fixed in the 65C02 as used in the Enhanced Apple IIe and Apple IIc, by the way, along with a bunch more instructions and addressing modes)
ADC, AND, CMP, EOR, LDA, ORA, SBC and STA all support “zeropage,X” and “indirect” addressing for their source, or in the case of STA their destination.
Zeropage,X
addressing takes a zeropage address as an operand, adds X and then uses
that zeropage address as the address to work on.
(Indirect, X)
addressing does the same thing and then loads the 16-bit address at
that address and that address + 1 and uses THAT as the address to work on.
… and (Indirect), Y
works completely differently. Note the different punctuation.
It takes a zero page address, then looks up the 16-bit address at that address and that address + 1, and then adds Y to that.
STA $1234 ; A -> M[ $1234 ]
STA $56, X ; A -> M[ $56 + X ]
STA ($78,X) ; A -> M[ MM[$78+X] ]
STA ($9A),Y ; A -> M[ MM[$9A] + Y ]
Because registers are only eight bit, if you want to operate anywhere in memory you have to use a zero page pair as a pointer to the start and then Y as an offset within that. Y is only 8 bits though, so you also need to change the high byte of the pointer, something like:
memcpy16
!zone {
LDY #$00
LDX zp_length+1
.loop.x
BNE .loop.y
LDY zp_length
BEQ .done.y
.loop.y
DEY
LDA (zp_src_addr), Y
STA (zp_dst_addr), Y
CPY #0
BNE .loop.y
.done.y
CPX #0
BEQ .done.x
INC zp_src_addr+1
INC zp_dst_addr+1
DEX
JMP .loop.x
.done.x
RTS
}
But what good is the very weird (indirect,X)
mode?
Assembly Cookbook for the Apple II/IIe
(p79) describes it as “an oddball” and not much use other than with X = 0,
and a quick look at the
Apple II DOS Source Code
suggests the same … it is used, but only with X = 0.
Okay, time to actually make some progress on this game. At the moment, there’s exactly one sprite, the goose. Let’s have a list of sprites instead.
Handling all the sprites as separate image files seems painful, so I might use a “sprite sheet” approach instead. The sprite sheet is always a fixed width, so starting from an image address you always add a constant to get to the next row. This potentially wastes a bit of space but makes the maths easier (and sprite drawing is very much an inner loop, so the faster we make it the better)
We want to write these from back to front so they occlude in the correct way. We could do this by keeping the sprite list ordered, but for now let’s just run through it once per row.
OK so I’ve gotten very bogged down at this point with too much other work and too much other stuff going on, but I’ll just mention a couple of little side-tracks before I hit publish and come back to it later
cc65 is a freeware C compiler for the 6502 and friends.
By default it tries to build an
AppleSingle
binary, but I don’t have the right tools for that so we turn it off with -D __EXEHDR__=0
I use a2tools to write the compiled code
to disk as a binary file called PROGRAM
which loads at the correct address.
MAKEFLAGS += r
TARGET = apple2
TARGET_LIB = ${TARGET}.lib
START_ADDR = 2000 # hex
LIBRARIES = library.o
%.s: %.c
cc65 -Oi -o $@ -t ${TARGET} $<
%.o: %.s
ca65 -o $@ $<
%.bin: %.o ${LIBRARIES}
ld65 -o $@ -t ${TARGET} -D __EXEHDR__=0 -S 0x${START_ADDR} $^ ${TARGET_LIB}
%.dsk: %.bin
cp dos33_loader.dsk $@
a2in B.${START_ADDR} $@ PROGRAM $<
run_%: %.dsk
mame apple2p -volume -24 -uimodekey DEL -flop1 $<
The dos33_loader.dsk is a bootable DOS 3.3 disk which automatically runs
a a tiny Applesoft program called HELLO
which loads the binary file PROGRAM
.
Printing a CHR$(4)
followed by a DOS command runs that command, I don’t know why.
10 PRINT CHR$(13) CHR$(4) "BRUN PROGRAM"
The compiled code isn’t spectacularly efficient, because it does stuff like:
lda _io_gr_graphics+1
sta ptr1+1
lda _io_gr_graphics
sta ptr1
lda #$00
tay
sta (ptr1),y
… rather than the somewhat simpler sta io_gr_graphics
(it hasn’t noticed it is a char * const
, so it could just use that address in “absolute” mode)
but it’s pretty easy to reverse engineer the calling
convention and write a library.h
and library.s
with the weirder and more performance-critical parts in assembly.
I’d rather dispense with DOS and use our tiny loader from before, with a boot sector loader to display
the splash screen immediately and then load the compiled C code at $0C00
, etc, but this is a good start.
This turns out to be very easy. Just create a tiny little startup.s
file:
.setcpu "6502"
.segment "STARTUP"
_startup:
.byte $10
and then target the code at address $0800
… the startup segment comes first,
so it’ll put that $10
byte at the start of the binary, causing BOOT0 to load
16 sectors and jump to $0801
, where the rest of the code is.
(You still have to do the DOS 3.3 sector shuffle, and this only loads one track, but thankfully our little printf hello world program fits into 12 sectors)
MAKEFLAGS += r
TARGET = apple2
TARGET_LIB = ${TARGET}.lib
.PHONY: run_%
.PRECIOUS: %.dsk
%.s: %.c
cc65 -Oi -o $@ -t ${TARGET} $<
%.o: %.s
ca65 -o $@ $<
%.bin: startup.o %.o
ld65 -o $@ -t ${TARGET} -D __EXEHDR__=0 -S 0x0800 $^ ${TARGET_LIB}
%.dsk: %.bin
../bin2dsk.py $< $@
run_%: %.dsk
mame apple2p -volume -24 -uimodekey DEL -flop1 $<
More practically, startup.s
could provide a proper boot loader modelled on
BOOT1 which would
handle splash screens, multiple tracks, DOS sector shuffling, etc,
and then the main C code could be targeted at $0C00
.
Of course, shifting to C isn’t necessarily going far enough, so the temptation is still there to try to compile a minimal Forth or minimal LISP or similar.
I also have a bit of an ongoing obsession with the concept of a self-hosting Python compiler which would use the built-in Python disassembler library to retrieve bytecodes for a function and then compile those to target code. Perhaps this could be used to compile a limited subset of Python code to 6502, without the stress of having to write an actual parser or compiler.
For updates either follow the RSS or follow me on Twitter.
Why not say hi on Mastodon?