Monday, April 18, 2022

Embedding data in the executable with llvm-mos

In my post about creating a SID player for Commodore 64 with llvm-mos I described a method of loading data from disk using assembly code and ROM routines. However, many programs for 8-bit computers are single file executables that hold all necessary data and code inside them. If you want to create a small demo or game that uses a lot of assets which could fit inside 64kB memory, merging them together with code into one file can be a good idea. As an additional benefit to being easier to copy (e.g. to tape), such executables can be easily compressed with tools like Exomizer, making them faster to load. So let's build a single file SID player with llvm-mos.

First, we need to embed a SID tune into a C source code. In my previous post I described briefly the structure of a SID file: a file header, which is 124 bytes long, and binary data, which begin with a two-byte load address. It means that to get the tune itself we need to skip first 126 bytes of the original file:
dd if=ginger.sid of=ginger.bin bs=1 skip=126
Then we must transform it into some form of a C data structure. I used a simple program called bin2c for this purpose:
bin2c ginger.bin sidplaymem.c sid
which produced a file sidplaymem.c containing a nice char array:
const char sid[2630] = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};
Now we have to instruct the C toolchain where the data should be put in memory. As you probably remember, a SID tune is in fact a machine code. Llvm C compiler, clang, also produces a machine code and puts it into specific segments (.text for code, .data for global variables, .bss for uninitialized data, etc). Then a linker puts the segments together, adds additional data (like external libraries, symbol table, executable header, and so on) and creates an executable file. It also decides where the segments will be placed in memory when the program is loaded from disk. So in order to embed our own data in the final executable, we must tell the compiler to put our data into its own segment, and then instruct the linker that the data must be put into a specific memory location.

Let's start with creating our own code segment ".sid". With llvm we can use a special variable attribute called section, so that:
const char sid[2630]
becomes:
char sid[2504] __attribute__ ((section (".sid")))
Because clang is an optimizng compiler we also need to add the volatile keyword and make sure that the sid array is somehow referenced in the code, otherwise it will be removed during compilation. GCC has attribute called ((used)) which tells the compiler that a variable should not be removed even if there is no reference to it in the code, but with llvm it's more difficult. Clang does not provide such construct so we must force it into thinking that the variable is in use. We can do it either by writing some meaningful code (like displaying some tune info on the screen) or by using it as an entry parameter to an assembly function, which by definition is never optimized. I used the second option, because I already had a function which initializes the SID tune:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000");
So I added a register variable x containing the first byte of the sid array as an input operand:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000" :: "x"(sid[0]));
Because the compiler does not analyze the contents of the asm() function and just forwards it further to the assembler as a plain text, it doesn't know that register X is being overwritten by instruction TAX and initializing it with sid[0] does nothing, so it leaves the sid array alone. A complete source code of the player now looks like this (I removed most of the SID data for clarity):
#include <string.h>

#define rasterline (*((volatile unsigned char*)0xD012))

volatile char sid[2504] __attribute__ ((section (".sid"))) = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};

int main() {
    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000" :: "x"(sid[0]));

    // Call SID refresh routine
    while (1) {
        if (rasterline == 255) {
            asm ("JSR $1003");
        }
    }

    return 0;
}
Now it's time to tell the linker what memory layout we want to use. Because the tune I've used has a load address $1000, we tell the linker to put section ".sid" into that address:
mos-c64-clang -Os -Wl,--section-start=.sid=0x1000 -o sidplaymem.prg sidplaymem.c
You may notice that the resulting executable sidplaymem.c is over 4.5kB, which is more that the SID data and the player code combined. It's because the Commodore 64 executable format is very primitive: it consists of a two-byte load address and a single block of data which starts with that address. Because the default load address for C64 programs is $0801, all free space between this address and $1000 is just filled with zeros to allow proper data alignment. But if you compress it with Exomizer you will get a file which is only 2kB:
exomizer sfx basic -o crunched.prg sidplaymem.prg
As with original llvm-mos sid player, you can find the complete source code on Github.

No comments: