Let's say you want to display a nice looking rainbow on the screen of you TV, like this one: Because a static picture is a little boring, you also want to animate it, so that the rainbow moves up (or down) the screen. The question is: what hardware do you need for this and what will be the software requirements?
What if I tell you that all you need is a computer with 1.8 MHz CPU, 48 kilobytes of RAM and the program that takes 29 (twenty nine) bytes? Yes, bytes. Less than the number of letters in this sentence. If you don't believe me, you can download a zip archive containing the software from this location. Inside you will find a file rainbow.xex which you can run on any Atari 800/800XL emulator, or even on the real hardware if you still have one (you can use a device like SIO2SD to transfer files to the Atari). I have run it on a real Atari 800XL connected to a plasma TV through a composite video lead, and it works perfectly.
How is that possible? The secret lies in talking directly to hardware. 8-bit computers have very limited resources, so any software layer which is not absolutely necessary is just a waste of memory and CPU cycles. A simple example is a boolean value, which can have one of two states: true or false. Any computer in the world, no matter how simple or complicated, needs only one bit to understand boolean values: 1 for true and 0 for false. Since a byte consists of 8 bits, you can store 8 different boolean values in one byte. Yet, in today's software world, the Java boolean primitive takes entire byte, and the simplest Boolean class:
public final class Boolean { public static final Boolean TRUE = new Boolean(true); public static final Boolean FALSE = new Boolean(false); private final boolean value; public Boolean(boolean value) { this.value = value; } public boolean booleanValue() { return value; } }compiles in Java 7 to 451 bytes of bytecode, not to mention it requires a few hundred megabytes runtime environment to load. This is the cost of having programmer friendly, enterprise grade, programming tools and languages.
Going back to the example, the algorithm to make Atari display the rainbow is quite simple:
1. Initialize graphic mode.
2. Read current scanline from Atari display chip - Antic. Antic creates image on a TV screen drawing it line by line, from left to right, 50 times (PAL) or 60 times (NTSC) per second. The horizontal resolution of an Atari screen is 240 pixels (including borders), which means that the scanline number can have values from 0 (first line) to 239 (last line).
3. If the line number is 0, increase the counter, which is used to calculate the colour palette shift. Increasing the counter on every new frame adds the animation effect to the rainbow picture.
3. Wait for Antic to finish drawing the current line on the screen (we don't want to change color in the middle of the line).
4. Change background colour to the value obtained from Antic (Atari has a total palette of 256 colors) summed up with the current shift value.
5. Repeat from point 2.
This algorithm can be easily translated into a C program:
#include <atari.h> #define SDMCTL (*((unsigned char*)0x22F)) int main() { unsigned char line, counter = 0; SDMCTL = 0; // disable display in BASIC screen area while (1) { line = ANTIC.vcount; // read current screen line if (!line) counter++; // increase color shift on new frame ANTIC.wsync = line; // block CPU until vertical sync GTIA_WRITE.colbk = line + counter; //change background color } return 0; }To compile this program on a PC, you can use a cross-platform cc65 compiler:
cl65 -t atari -o rainbow.xex rainbow.cHowever, the resulting executable is 1104 bytes long. Let's make the compiler to output the assembly code to see the CPU instruction list it produced:
cc65 -t atari rainbow.cThe assembly source code in the resulting rainbow.s file looks like this:
jsr decsp1 lda #$00 jsr pusha ldx #$00 lda #$00 sta $022F L0008: ldx #$00 lda $D40B ldy #$01 sta (sp),y ldy #$01 ldx #$00 lda (sp),y jsr bnega jeq L000E ldy #$00 ldx #$00 lda (sp),y pha clc adc #$01 ldy #$00 sta (sp),y pla L000E: ldy #$01 ldx #$00 lda (sp),y sta $D40A ldy #$01 ldx #$00 lda (sp),y jsr pushax ldy #$02 ldx #$00 lda (sp),y jsr tosaddax sta $D01A jmp L0008 ldx #$00 lda #$00 jmp L0002 L0002: jsr incsp2 rtsQuite long, isn't it? If you understand the assembly language, you will probably notice a few problems here. First, the program jumps a few times to some strange subroutines (like pusha, bnega, tosaddax, etc.), which does not seem necessary. Second, it uses a lot of reads and writes to the stack (although the algorithm uses only two local variables (line and counter). Third, some instructions are completely unnecessary (like LDX, which loads data to never used CPU index register X).
Fortunately, the cc65 compiler provides some optimizations: you can use "‑Oi" and "‑Os" command line switches to inline all subroutines and get rid of jsr instructions, you can also make local variables to be stored under fixed memory address instead of the stack through "‑Cl". Also, using "‑Or" switch enables register variables, which is particularly interesting, because it does not make CPU use physical registers (since MOS 6502 processor used in the 8-bit Atari has only one general purpose register, called accumulator), but it makes variables to be stored in first 256 bytes of physical memory, called zero page. The advantage of using zero page is using less CPU cycles to access it: 8-bit memory addresses (from 0 to 255) are decoded faster by 8-bit processing unit than 16-bit addresses (from 0 to 65535 - the memory limit in Atari 800XL is 64KB).
Let's see what happens if we turn the optimizations on:
cc65 -Cl -Osir -t atari rainbow.cThe resulting assembly code is now much smaller:
lda #$00 sta L0004 sta $022F L000A: lda $D40B sta L0003 lda L0003 bne L0010 lda L0004 clc adc #$01 sta L0004 L0010: lda L0003 sta $D40A lda L0003 clc adc L0004 sta $D01A jmp L000ATo produce a working executable, we have to assemble and link it:
ca65 rainbow.s && cl65 -t atari -o rainbow.xex rainbow.oStill, the resulting code is 630 bytes long. This is because the compiler still includes some Atari specific code from the standard library. We can work around this using an assembler (like xasm or MADS). But first, we need to clean up the assembly code a bit:
org $2000 ; start address lda #$00 ; load accumulator with 0 sta $cb ; store it in memory sta $022f ; disable display in BASIC screen area loop lda $d40b ; read current screen line into accumulator sta $cc ; store it in zero page lda $cc ; load current line number from memory bne skip ; if it's not zero jump to code at 'skip' label lda $cb ; load counter from memory into accumulator clc ; clear carry flag adc #$01 ; add 1 to accumulator sta $cb ; store counter in memory skip lda $cc ; load current line number from memory sta $d40a ; block CPU until vertical sync lda $cc ; load current line number from memory clc ; clear carry flag adc $cb ; add counter to the line number sta $d01a ; change background colour jmp loop ; repeatI added some comments to make clear what's going on in the code. So, let's assemble it:
xasm rainbow.s /o:rainbow.xexNow rainbow.xex is only 45 bytes long! Two first bytes are the executable file header ($FFFF) and the next two are memory address the program should be loaded into ($2000). The rest is pure CPU instruction list.
But we can still make it smaller, using some knowledge the compiler does not have. First, we don't need to init the counter stored at $cb memory address, because we don't care about it's initial value (it gets increased infinitely anyway), so we can get rid of "sta $cb" instruction. Second, we don't need "lda $cc" right after "sta $cc" and "sta $d40a", because the "sta" instruction doesn't change the accumulator state, so there's no need to reload it. We can also replace the whole procedure of loading counter from memory into accumulator, adding 1 and storing it back in memory, with only one instruction - "inc $cb" - which increases the memory value by 1. Finally, we don't need "clc" to clear the carry flag (the flag indicating that a mathematical operation resulted in value which does not fit in 8 bits) before adding line number and counter, because Atari colour palette can have only 256 values, so it doesn't matter if the addition result fits in 8 bits or not.
After making the changes, the code now looks as follows:
org $2000 ; start address lda #$00 ; load accumulator with 0 sta $022f ; disable display in BASIC screen area loop lda $d40b ; read current screen line into accumulator sta $cc ; store it in zero page bne skip ; if it's not zero jump to code at 'skip' label inc $cb ; increase counter in memory by 1 skip lda $cc ; load current line number from memory sta $d40a ; block CPU until vertical sync adc $cb ; add counter to the line number sta $d01a ; change background colour jmp loop ; repeatAfter assembling it with xasm, the resulting executable is now only 33 bytes long. However, we can still cut the instruction list down. You may notice that there is no need to load and store accumulator at $cc memory address, because its state doesn't change between "sta $cc" and "lda $cc" - so we can safely remove those instructions:
org $2000 ; start address lda #$00 ; load accumulator with 0 sta $022f ; disable display in BASIC screen area loop lda $d40b ; read current screen line into accumulator bne skip ; if it's not zero jump to code at 'skip' label inc $cc ; increase counter in memory by 1 skip sta $d40a ; block CPU until vertical sync adc $cc ; add counter to the line number sta $d01a ; change background colour jmp loop ; repeatAs we got rid of another couple of bytes, rainbow.xex is now exactly 29 bytes after assembling.
Lessons learned:
1. Compilation with default compiler options sucks.
2. A smart compiler can do a really great job optimizing code.
3. There is always a way for a programmer to optimize it even more.
4. Programming old computers is fun.
No comments:
Post a Comment