Let's say you want to display a nice looking rainbow on the screen of you TV, like this one: Because a static picture is a little boring, you also want to animate it, so that the rainbow moves up (or down) the screen. The question is: what hardware do you need for this and what will be the software requirements?
What if I tell you that all you need is a computer with 1.8 MHz CPU, 48 kilobytes of RAM and the program that takes 29 (twenty nine) bytes? Yes, bytes. Less than the number of letters in this sentence. If you don't believe me, you can download a zip archive containing the software from this location. Inside you will find a file rainbow.xex which you can run on any Atari 800/800XL emulator, or even on the real hardware if you still have one (you can use a device like SIO2SD to transfer files to the Atari). I have run it on a real Atari 800XL connected to a plasma TV through a composite video lead, and it works perfectly.
How is that possible? The secret lies in talking directly to hardware. 8-bit computers have very limited resources, so any software layer which is not absolutely necessary is just a waste of memory and CPU cycles. A simple example is a boolean value, which can have one of two states: true or false. Any computer in the world, no matter how simple or complicated, needs only one bit to understand boolean values: 1 for true and 0 for false. Since a byte consists of 8 bits, you can store 8 different boolean values in one byte. Yet, in today's software world, the Java boolean primitive takes entire byte, and the simplest Boolean class:
public final class Boolean
{
public static final Boolean TRUE = new Boolean(true);
public static final Boolean FALSE = new Boolean(false);
private final boolean value;
public Boolean(boolean value) {
this.value = value;
}
public boolean booleanValue() {
return value;
}
}
compiles in Java 7 to 451 bytes of bytecode, not to mention it requires a few hundred megabytes runtime environment to load. This is the cost of having programmer friendly, enterprise grade, programming tools and languages.Going back to the example, the algorithm to make Atari display the rainbow is quite simple:
1. Initialize graphic mode.
2. Read current scanline from Atari display chip - Antic. Antic creates image on a TV screen drawing it line by line, from left to right, 50 times (PAL) or 60 times (NTSC) per second. The horizontal resolution of an Atari screen is 240 pixels (including borders), which means that the scanline number can have values from 0 (first line) to 239 (last line).
3. If the line number is 0, increase the counter, which is used to calculate the colour palette shift. Increasing the counter on every new frame adds the animation effect to the rainbow picture.
3. Wait for Antic to finish drawing the current line on the screen (we don't want to change color in the middle of the line).
4. Change background colour to the value obtained from Antic (Atari has a total palette of 256 colors) summed up with the current shift value.
5. Repeat from point 2.
This algorithm can be easily translated into a C program:
#include <atari.h>
#define SDMCTL (*((unsigned char*)0x22F))
int main() {
unsigned char line, counter = 0;
SDMCTL = 0; // disable display in BASIC screen area
while (1) {
line = ANTIC.vcount; // read current screen line
if (!line) counter++; // increase color shift on new frame
ANTIC.wsync = line; // block CPU until vertical sync
GTIA_WRITE.colbk = line + counter; //change background color
}
return 0;
}
To compile this program on a PC, you can use a cross-platform cc65 compiler:
cl65 -t atari -o rainbow.xex rainbow.cHowever, the resulting executable is 1104 bytes long. Let's make the compiler to output the assembly code to see the CPU instruction list it produced:
cc65 -t atari rainbow.cThe assembly source code in the resulting rainbow.s file looks like this:
jsr decsp1
lda #$00
jsr pusha
ldx #$00
lda #$00
sta $022F
L0008: ldx #$00
lda $D40B
ldy #$01
sta (sp),y
ldy #$01
ldx #$00
lda (sp),y
jsr bnega
jeq L000E
ldy #$00
ldx #$00
lda (sp),y
pha
clc
adc #$01
ldy #$00
sta (sp),y
pla
L000E: ldy #$01
ldx #$00
lda (sp),y
sta $D40A
ldy #$01
ldx #$00
lda (sp),y
jsr pushax
ldy #$02
ldx #$00
lda (sp),y
jsr tosaddax
sta $D01A
jmp L0008
ldx #$00
lda #$00
jmp L0002
L0002: jsr incsp2
rts
Quite long, isn't it? If you understand the assembly language, you will probably notice a few problems here. First, the program jumps a few times to some strange subroutines (like pusha, bnega, tosaddax, etc.), which does not seem necessary. Second, it uses a lot of reads and writes to the stack (although the algorithm uses only two local variables (line and counter). Third, some instructions are completely unnecessary (like LDX, which loads data to never used CPU index register X).Fortunately, the cc65 compiler provides some optimizations: you can use "‑Oi" and "‑Os" command line switches to inline all subroutines and get rid of jsr instructions, you can also make local variables to be stored under fixed memory address instead of the stack through "‑Cl". Also, using "‑Or" switch enables register variables, which is particularly interesting, because it does not make CPU use physical registers (since MOS 6502 processor used in the 8-bit Atari has only one general purpose register, called accumulator), but it makes variables to be stored in first 256 bytes of physical memory, called zero page. The advantage of using zero page is using less CPU cycles to access it: 8-bit memory addresses (from 0 to 255) are decoded faster by 8-bit processing unit than 16-bit addresses (from 0 to 65535 - the memory limit in Atari 800XL is 64KB).
Let's see what happens if we turn the optimizations on:
cc65 -Cl -Osir -t atari rainbow.cThe resulting assembly code is now much smaller:
lda #$00
sta L0004
sta $022F
L000A: lda $D40B
sta L0003
lda L0003
bne L0010
lda L0004
clc
adc #$01
sta L0004
L0010: lda L0003
sta $D40A
lda L0003
clc
adc L0004
sta $D01A
jmp L000A
To produce a working executable, we have to assemble and link it:
ca65 rainbow.s && cl65 -t atari -o rainbow.xex rainbow.oStill, the resulting code is 630 bytes long. This is because the compiler still includes some Atari specific code from the standard library. We can work around this using an assembler (like xasm or MADS). But first, we need to clean up the assembly code a bit:
org $2000 ; start address
lda #$00 ; load accumulator with 0
sta $cb ; store it in memory
sta $022f ; disable display in BASIC screen area
loop
lda $d40b ; read current screen line into accumulator
sta $cc ; store it in zero page
lda $cc ; load current line number from memory
bne skip ; if it's not zero jump to code at 'skip' label
lda $cb ; load counter from memory into accumulator
clc ; clear carry flag
adc #$01 ; add 1 to accumulator
sta $cb ; store counter in memory
skip
lda $cc ; load current line number from memory
sta $d40a ; block CPU until vertical sync
lda $cc ; load current line number from memory
clc ; clear carry flag
adc $cb ; add counter to the line number
sta $d01a ; change background colour
jmp loop ; repeat
I added some comments to make clear what's going on in the code. So, let's assemble it:
xasm rainbow.s /o:rainbow.xexNow rainbow.xex is only 45 bytes long! Two first bytes are the executable file header ($FFFF) and the next two are memory address the program should be loaded into ($2000). The rest is pure CPU instruction list.
But we can still make it smaller, using some knowledge the compiler does not have. First, we don't need to init the counter stored at $cb memory address, because we don't care about it's initial value (it gets increased infinitely anyway), so we can get rid of "sta $cb" instruction. Second, we don't need "lda $cc" right after "sta $cc" and "sta $d40a", because the "sta" instruction doesn't change the accumulator state, so there's no need to reload it. We can also replace the whole procedure of loading counter from memory into accumulator, adding 1 and storing it back in memory, with only one instruction - "inc $cb" - which increases the memory value by 1. Finally, we don't need "clc" to clear the carry flag (the flag indicating that a mathematical operation resulted in value which does not fit in 8 bits) before adding line number and counter, because Atari colour palette can have only 256 values, so it doesn't matter if the addition result fits in 8 bits or not.
After making the changes, the code now looks as follows:
org $2000 ; start address
lda #$00 ; load accumulator with 0
sta $022f ; disable display in BASIC screen area
loop
lda $d40b ; read current screen line into accumulator
sta $cc ; store it in zero page
bne skip ; if it's not zero jump to code at 'skip' label
inc $cb ; increase counter in memory by 1
skip
lda $cc ; load current line number from memory
sta $d40a ; block CPU until vertical sync
adc $cb ; add counter to the line number
sta $d01a ; change background colour
jmp loop ; repeat
After assembling it with xasm, the resulting executable is now only 33 bytes long. However, we can still cut the instruction list down. You may notice that there is no need to load and store accumulator at $cc memory address, because its state doesn't change between "sta $cc" and "lda $cc" - so we can safely remove those instructions:
org $2000 ; start address
lda #$00 ; load accumulator with 0
sta $022f ; disable display in BASIC screen area
loop
lda $d40b ; read current screen line into accumulator
bne skip ; if it's not zero jump to code at 'skip' label
inc $cc ; increase counter in memory by 1
skip
sta $d40a ; block CPU until vertical sync
adc $cc ; add counter to the line number
sta $d01a ; change background colour
jmp loop ; repeat
As we got rid of another couple of bytes, rainbow.xex is now exactly 29 bytes after assembling.Lessons learned:
1. Compilation with default compiler options sucks.
2. A smart compiler can do a really great job optimizing code.
3. There is always a way for a programmer to optimize it even more.
4. Programming old computers is fun.
