Monday, April 18, 2022

Embedding data in the executable with llvm-mos

In my post about creating a SID player for Commodore 64 with llvm-mos I described a method of loading data from disk using assembly code and ROM routines. However, many programs for 8-bit computers are single file executables that hold all necessary data and code inside them. If you want to create a small demo or game that uses a lot of assets which could fit inside 64kB memory, merging them together with code into one file can be a good idea. As an additional benefit to being easier to copy (e.g. to tape), such executables can be easily compressed with tools like Exomizer, making them faster to load. So let's build a single file SID player with llvm-mos.

First, we need to embed a SID tune into a C source code. In my previous post I described briefly the structure of a SID file: a file header, which is 124 bytes long, and binary data, which begin with a two-byte load address. It means that to get the tune itself we need to skip first 126 bytes of the original file:
dd if=ginger.sid of=ginger.bin bs=1 skip=126
Then we must transform it into some form of a C data structure. I used a simple program called bin2c for this purpose:
bin2c ginger.bin sidplaymem.c sid
which produced a file sidplaymem.c containing a nice char array:
const char sid[2630] = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};
Now we have to instruct the C toolchain where the data should be put in memory. As you probably remember, a SID tune is in fact a machine code. Llvm C compiler, clang, also produces a machine code and puts it into specific segments (.text for code, .data for global variables, .bss for uninitialized data, etc). Then a linker puts the segments together, adds additional data (like external libraries, symbol table, executable header, and so on) and creates an executable file. It also decides where the segments will be placed in memory when the program is loaded from disk. So in order to embed our own data in the final executable, we must tell the compiler to put our data into its own segment, and then instruct the linker that the data must be put into a specific memory location.

Let's start with creating our own code segment ".sid". With llvm we can use a special variable attribute called section, so that:
const char sid[2630]
becomes:
char sid[2504] __attribute__ ((section (".sid")))
Because clang is an optimizng compiler we also need to add the volatile keyword and make sure that the sid array is somehow referenced in the code, otherwise it will be removed during compilation. GCC has attribute called ((used)) which tells the compiler that a variable should not be removed even if there is no reference to it in the code, but with llvm it's more difficult. Clang does not provide such construct so we must force it into thinking that the variable is in use. We can do it either by writing some meaningful code (like displaying some tune info on the screen) or by using it as an entry parameter to an assembly function, which by definition is never optimized. I used the second option, because I already had a function which initializes the SID tune:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000");
So I added a register variable x containing the first byte of the sid array as an input operand:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000" :: "x"(sid[0]));
Because the compiler does not analyze the contents of the asm() function and just forwards it further to the assembler as a plain text, it doesn't know that register X is being overwritten by instruction TAX and initializing it with sid[0] does nothing, so it leaves the sid array alone. A complete source code of the player now looks like this (I removed most of the SID data for clarity):
#include <string.h>

#define rasterline (*((volatile unsigned char*)0xD012))

volatile char sid[2504] __attribute__ ((section (".sid"))) = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};

int main() {
    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000" :: "x"(sid[0]));

    // Call SID refresh routine
    while (1) {
        if (rasterline == 255) {
            asm ("JSR $1003");
        }
    }

    return 0;
}
Now it's time to tell the linker what memory layout we want to use. Because the tune I've used has a load address $1000, we tell the linker to put section ".sid" into that address:
mos-c64-clang -Os -Wl,--section-start=.sid=0x1000 -o sidplaymem.prg sidplaymem.c
You may notice that the resulting executable sidplaymem.c is over 4.5kB, which is more that the SID data and the player code combined. It's because the Commodore 64 executable format is very primitive: it consists of a two-byte load address and a single block of data which starts with that address. Because the default load address for C64 programs is $0801, all free space between this address and $1000 is just filled with zeros to allow proper data alignment. But if you compress it with Exomizer you will get a file which is only 2kB:
exomizer sfx basic -o crunched.prg sidplaymem.prg
As with original llvm-mos sid player, you can find the complete source code on Github.

Tuesday, April 12, 2022

Using a cheap oscilloscope to decode serial communication

Serial interface is one of the oldest and simplest communication methods used in computing. Back in the day it was used to connect a terminal to a mainframe, and it's still a great way of connecting computers that have no common way of talking to each other, like Commodore 64 and PC or TRS-80 and Amiga. Nowadays serial interfaces are widely used to communicate with microcontrollers. If you ever played around with Arduino or alike, and tried to do more advanced projects than blinking LEDs, you probably talked to a microcontroller via some sort of serial connection.

But no matter if you just used some wires, a null modem cable, or an USB, you probably ran into a situation where the communication didn't work. It's hard to find the source of the problem if you don't even know if it's in hardware or in software. The best way to determine if the hardware part is working correctly is to use an oscilloscope with a built-in serial decoder. But such oscilloscopes are very expensive and if you are not a professional electronics technician chances are that you don't own one. Or maybe you only have a basic model, without a serial decoder? Don't worry, it will do. I have been using a simple DSO 150 clone with success, and I paid for it less than 30 dollars.

Software setup

To demonstrate how to use a digital oscilloscope to decode serial communication we will use a very basic setup, consisting of an Arduino Uno, a couple of wires, and a simple program. Let's start with the program first:
void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.print("a");
  delay(1000);
}
Use Arduino IDE to compile and upload the program. Now you can open a serial monitor in the tools menu to verify if it works. You should see a letter "a" printed out to the screen once a second - it means that the transmission is working correctly.

Hardware setup

However, your computer is not connected to the serial line of the Atmega328 microcontroller directly. The signals are being translated by a separate chip and sent through USB. But you can still hook up to the serial interface itself, using only three wires. Look for pins RX, TX and GND on the Arduino board. RX stands for "receive" and TX for "transmit". The voltage level on those pins depends on the board, but in case of Arduino Uno it's 5V. Because you have current flowing through those pins, you also need a ground pin (GND). If you have a serial to usb converter, you can connect the Arduino to the converter using the following wiring:
RX  <---> TX
TX  <---> RX
GND <---> GND
It's the simplest serial cable you can build. There are other pins as well, but they are not necessary for the connection to work. Now you can use a terminal program connected to the serial port used by the converter to watch the transmission. When you power up the Arduino (through USB or PSU) you should see the same result as you've seen with the serial monitor: letter "a" printed out once a second.

Connecting oscilloscope

To debug a serial connection you need to use an oscilloscope. Connect a probe to the TX pin, and a ground to the GND pin:
Now set the vertical resolution to 2V and the horizontal one to 0.2ms. Switch the oscilloscope to single mode, so that it holds a waveform when it detects data on the transmission pin. After approximately one second, you should see the following picture:
Decoding serial data

After you managed to capture the data, it's time to decode it. As you can see, the default serial line state is high (logical 1), and a single pulse (which represents one bit) takes around one half of a square field on the oscilloscope screen. Standard parameters for the serial transmission are 8N1, which means that one data packet consists of 10 bits: a start bit (which is always 0), 8 data bits (from lowest to highest), and a stop bit (which is always 1). The bit pattern shown by the oscilloscope should now be clear:
When you remove the start and the stop bits, and rearrange the data bits from highest to lowest, you are left with binary number 01100001, which is decimal 97. Now take a look at the ASCII table and search for code 97. It stands for the letter "a".

Summary

Debugging serial communication can be a tedious task, although sometimes necessary. But, as it turns out, it can be done with very simple tools, without an expensive serial decoder.

Friday, April 1, 2022

llvm-mos SID player for Commodore 64

In my previous post I described llvm-mos, a new llvm backend which can generate code for 6502 CPU, and I presented a simple program for an 8-bit Atari. Today I want to show you how you can use assembly code in your C program by using a more complicated example. This time it will be a music player for Commodore 64.

The most popular C64 chiptune format is SID. It's basically a 6502 machine code which consists of two main routines: an init routine which sets up the sound chip, and a player routine which must be called once per screen frame. The biggest resource of SID tunes is the High Voltage SID Collection. There are many players which can play SID music files on modern devices, like ZXTune or VLC.

Preparing a SID file

In order to play a SID tune on a real Commodore 64 we must determine where it should be located in memory, and what are the addresses of the init and the player routines. This information can be found in the SID file header. Bytes 8 and 9 tell if the the data are in original C64 binary file format, i.e. the first two bytes of the data contain the little-endian load address (low byte, high byte), bytes 10 and 11 hold the init address, and bytes 12 and 13 contain the play address. Each tune can use different addresses, so it's necessary to check them carefully. For this project I'll be using one of my favourite C64 tunes, "Ginger" by Kristian Røstøen. When you analyze the header of this particular tune you'll notice that the init address is $1000, the play address is $1003, and the binary data, which begin at offset 124, already contain the load address $1000. We can use this information to extract the machine code from the SID file:
dd if=ginger.sid of=ginger.prg bs=1 skip=124
The resulting file ginger.prg can be put on a floppy disk and loaded on a Commodore 64 with the following command:
LOAD "GINGER",8,1

Loading a SID file

Instead of loading the file into memory by hand, we can write some code to do it for us. Unfortunately, llvm-mos does not have any library functions to do it at the moment, but fortunately Commodore 64 ROM does. The easiest way to use the ROM functions is through assembly, so let's try to write some assembly code in llvm:
char *fname = "GINGER";
unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
unsigned char fnamehi = (unsigned int)fname >> 8;
unsigned char fnamelen = strlen(fname);

asm("JSR $FFBD" :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));
The C part should be pretty much self-explanatory. We create variables which contain the lower and the higher byte of the file name pointer, and the file name length. You may think now that using strlen() function is an overkill. Well, in cc65 it probably is, but llvm-mos optimizer can translate the whole expression into a single CPU instruction:
LDA #$06
The assembly part of the code needs more explanation. Here we call SETNAM function located in ROM at address $FFBD. Before we call it, we need to put an address of the string containing the file name in two 8-bit CPU registers, X and Y, and the string length in register A. A pure assembly code would look like this:
LDA fnamelen
LDX fnamelo
LDY fnamehi
JSR $FFBD
However, in llvm-mos we cannot use C expressions in assembly code directly. But we can use so called input and output operands to transfer C variables to CPU registers and vice versa. In gcc syntax for inline assembly, which llvm also uses, operands are separated from assembly expresssions by colons. You can read more about using inline assembly and operands here.

After setting up the file name we need to provide some information about the device from which data should be loaded, and whether it should be loaded to an address pointed by the file header:
asm("LDA #$01\n\t"     // logical file number
    "LDX #$08\n\t"     // device number
    "LDY #$01\n\t"     // load to address found in file header
    "JSR $FFBA\n\t");  // SETLFS
Remember the LOAD "GINGER",8,1 command? This is the ",8,1" part, but written in assembly.

Now all that's left is to load the file into memory:
asm ("LDA #$00\n\t"     // load to memory
     "JSR $FFD5\n\t");  // LOAD

Playing a SID tune

It's time to play the music, then. Let's start with the init routine:
asm ("LDA #$00\n\t"
     "TAX\n\t"
     "TAY\n\t"
     "JSR $1000");
Now we have two ways of calling the play routine. The first one is fairly simple. We need to wait in a loop until the raster reaches a certain line on the screen, and when it happens, call the play routine. The line number can be any number from 0 to 263 (for NTSC machines) or 312 (for PAL machines), but because the raster line register $D012 is 8-bit, it means that it can only hold values from 0 to 255 and that values from 0 to 8 (NTSC) and 0 to 56 (PAL) appear in $D012 more than once per frame. A solution to this problem is to check the 7th bit of register $D011, which is 0 for lines from 0 to 255 and 1 for lines > 255, or use values bigger than 56, which never appear in register $D012 twice during one frame. In my example, I'm using 255:
#define rasterline (*((volatile unsigned char*)0xD012))

while (1) {
    if (rasterline == 255) {
        asm ("JSR $1003");
    }
}
Putting all the code snippets together, a complete program looks like this:
#include <string.h>

#define rasterline (*((volatile unsigned char*)0xD012))

int main() {
    char *fname = "GINGER";
    unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
    unsigned char fnamehi = (unsigned int)fname >> 8;
    unsigned char fnamelen = strlen(fname);

    // Load SID file into memory
    asm("JSR $FFBD\n\t"  // SETNAM (A = fname lenght, XY = fname address)
        "LDA #$01\n\t"   // logical file number
        "LDX #$08\n\t"   // device number
        "LDY #$01\n\t"   // load to address found in file header
        "JSR $FFBA\n\t"  // SETLFS
        "LDA #$00\n\t"   // load to memory
        "JSR $FFD5\n\t"  // LOAD
        :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));

    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000");

    // Call SID refresh routine
    while (1) {
        if (rasterline == 255) {
            asm ("JSR $1003");
        }
    }

    return 0;
}

Playing a SID tune using interrupts

The second way of calling the play routine is to use raster interrupts. The principle is the same, we call the play routine every frame, but this time we don't have to wait in a loop for a given raster line. We only need to write a short function which will be called when an interrupt occurs, and tell the computer to generate the interrupt at a certain raster line. The addresses of the interrupt routines are stored at the beginning of the memory in a structure called vector table. We need to modify this table so that a raster interrupt vector now points to our function instead of the original one.

Let's write the function first:
void play() {
    asm ("ASL $D019\n\t"  // acknowledge interrupt (reset bit 0)
         "JSR $1003\n\t"  // call SID refresh routine
         "JMP $EA31");    // jump to default interrupt handler routine
}
When writing a raster interrupt function, we need to do two additional things. First, we need to let the computer know that the interrupt was handled properly. This is done by reading register $D019, zeroing it's first byte, and writing back to it. The easiest and fastest way to do it is by using Arithmetic Shift Left (ASL) instruction, which does exactly that. Second, because we hijacked the original interrupt vector, we want to call the original ROM routine when we finish executing our own code.

Now it's time to modify the system interrupts:
#define intctrl (*((volatile unsigned char*)0xDC0D))
#define screenctrl (*((volatile unsigned char*)0xD011))
#define rasterline (*((volatile unsigned char*)0xD012))
#define rasterintctrl (*((volatile unsigned char*)0xD01A))
#define rasterintlo (*((volatile unsigned char*)0x314))
#define rasterinthi (*((volatile unsigned char*)0x315))

void (* fun)(void) = &play;
unsigned char funlo = (unsigned int)fun & 0xFFFF;
unsigned char funhi = (unsigned int)fun >> 8;

asm("SEI");           // switch off interrupts
intctrl = 0x7F;       // disable CIA interrupts
rasterintctrl = 1;    // enable raster interrupts
rasterintlo = funlo;  // low byte of raster interrupt routine
rasterinthi = funhi;  // high byte of raster interrupt routine
rasterline = 127;     // trigger interrupt at raster line 127
screenctrl = screenctrl & 0x7F;
asm("CLI");           // switch on interrupts
First, we need to disable all interrupts with a CPU instruction SEI. We have to do it because there is a chance that an interrupt occurs after we modified some registers, but before we modified others, and it will cause the CPU to jump to a wrong place in memory and crash. Next, we ask the system to disable clock interrupts by zeroing the highest bit of $DC0D, and enable screen raster interrupts instead. Then we change addresses $314 and $315 in the vector table to point to the play() function. Finally, we indicate a raster line which will trigger the interrupt. After we finish modifying the system interrupts we can re-enable them with CLI.

Remember that rasterline register can hold only 8 bits, so you also need to set the highest bit of screenctrl register accordingly. For example, to generate an interrupt when line number 260 is drawn on the screen, you need to do the following:
screenctrl = screenctrl | 0x80;  // last bit equals 1 for lines > 255
rasterline = 4;                  // 260 - 256 = 4
You might look at it as if the rasterline register was 9-bit, with address $D012 holding the lowest 8 bits, and $D011 holding the highest bit in its own highest bit.

Now let's get back to our example. A complete program now looks like this:
#include <string.h>

#define intctrl (*((volatile unsigned char*)0xDC0D))
#define screenctrl (*((volatile unsigned char*)0xD011))
#define rasterline (*((volatile unsigned char*)0xD012))
#define rasterintctrl (*((volatile unsigned char*)0xD01A))
#define rasterintlo (*((volatile unsigned char*)0x314))
#define rasterinthi (*((volatile unsigned char*)0x315))

// SID refresh function
void play() {
    asm("ASL $D019\n\t"  // acknowledge interrupt (reset bit 0)
        "JSR $1003\n\t"  // call SID refresh routine
        "JMP $EA31");    // jump to default interrupt handler routine
}

int main() {
    char *fname = "GINGER";
    unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
    unsigned char fnamehi = (unsigned int)fname >> 8;
    unsigned char fnamelen = strlen(fname);
    void (* fun)(void) = &play;
    unsigned char funlo = (unsigned int)fun & 0xFFFF;
    unsigned char funhi = (unsigned int)fun >> 8;

    // Load SID file into memory
    asm("JSR $FFBD\n\t"  // SETNAM (A = fname lenght, XY = fname address)
        "LDA #$01\n\t"   // logical file number
        "LDX #$08\n\t"   // device number
        "LDY #$01\n\t"   // load to address found in file header
        "JSR $FFBA\n\t"  // SETLFS
        "LDA #$00\n\t"   // load to memory
        "JSR $FFD5\n\t"  // LOAD
        :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));

    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000");

    // Set up raster interrupt
    asm("SEI");           // switch off interrupts
    intctrl = 0x7F;       // disable CIA interrupts
    rasterintctrl = 1;    // enable raster interrupts
    rasterintlo = funlo;  // low byte of raster interrupt routine
    rasterinthi = funhi;  // high byte of raster interrupt routine
    rasterline = 127;     // trigger interrupt at raster line 127
    screenctrl = screenctrl & 0x7F;
    asm("CLI");           // switch on interrupts

    return 0;
}
It's a bit longer than the first version, but a side effect of using interrupts instead of a loop is that we can still type on the screen and execute simple Basic commands while the music plays in the background.
If you want to try the program yourself, you can find the complete code on Github.

Sunday, March 27, 2022

llvm-mos compiler for 8-bit 6502 machines

A while ago I posted an article about programming for an 8-bit Atari. I wrote a simple program in C, compiled it with cc65, and then optimized the assembly output. The best result I could get with built-in compiler optimizations was 630 bytes, which I then reduced to 29 bytes using manual machine code refactoring and an alternative assembler. It proved that you can program an 8-bit machine in C, but the result is far from perfect and hand-written assembly is still better.

A few months ago everything changed. A project llvm-mos has started and achieved some really impressive results so far. Llvm is a highly optimizing compiler toolchain, which can be used as a front end for any modern programming language. Clang, an llvm frontend for C, C++, and Objective-C, is a default compiler for Mac OS X, where it replaced gcc in 2011. Llvm can also produce machine code for any CPU architecture when given a suitable backend. LLvm-mos provides such backend for 6502 CPU, which means that you can write code for Commodore 64 or Atari 400/800/XL/XE even in Rust!

I decided to compile my original program with llvm-mos and compare it with the latest cc65. The nicest thing was that I didn't have to touch the source code at all, I only had to provide a custom atari.h header, which I simply copied from cc65 include directory, with some minor changes (more on this later). The code looks like this:
#include "atari.h"

#define SDMCTL (*((unsigned char*)0x22F))

int main() {
    unsigned char line, counter = 0;

    SDMCTL = 0; // disable display in BASIC screen area
    while (1) {
        line = ANTIC.vcount;  // read current screen line
        if (!line) counter++; // increase colour shift on new frame
        ANTIC.wsync = line;   // block CPU until vertical sync
        GTIA_WRITE.colbk = line + counter; // change background colour
    }
    return 0;
}
What it does is that it displays a moving rainbow animation on the screen.
When I compiled my program with the latest cc65 version I got a 557 bytes long executable. LLvm-mos produced 58 bytes of code. A hand-written assembly version is 29 bytes long. I didn't do any speed comparisons, but the results I found show that llvm-mos creates machine code which is faster than the code produced by cc65 by a long shot.

Now a word about compatibility between cc65 and llvm-mos. I didn't have many problems so far, and I could even use cc65 include files, but with two caveats. First, you have to get rid of cc65 specific extensions like __fastcall__ or __asm__. Second, you need to remember about using volatile keyword every time you access a variable which is mapped to a memory address or an input / output port. For example, you have to change this:
struct __antic {
    unsigned char   dmactl; /* (W) direct memory access control */
    unsigned char   chactl; /* (W) character mode control */
    unsigned char   dlistl; /* display list pointer low-byte */
    unsigned char   dlisth; /* display list pointer high-byte */
    unsigned char   hscrol; /* (W) horizontal scroll enable */
    unsigned char   vscrol; /* (W) vertical scroll enable */
    unsigned char   unuse0; /* unused */
    unsigned char   pmbase; /* (W) msb of p/m base address */
    unsigned char   unuse1; /* unused */
    unsigned char   chbase; /* (W) msb of character set base address */
    unsigned char   wsync;  /* (W) wait for horizontal synchronization */
    unsigned char   vcount; /* (R) vertical line counter */
    unsigned char   penh;   /* (R) light pen horizontal position */
    unsigned char   penv;   /* (R) light pen vertical position */
    unsigned char   nmien;  /* (W) non-maskable interrupt enable */
};
into this:
struct __antic {
    volatile unsigned char   dmactl; /* (W) direct memory access control */
    volatile unsigned char   chactl; /* (W) character mode control */
    volatile unsigned char   dlistl; /* display list pointer low-byte */
    volatile unsigned char   dlisth; /* display list pointer high-byte */
    volatile unsigned char   hscrol; /* (W) horizontal scroll enable */
    volatile unsigned char   vscrol; /* (W) vertical scroll enable */
    volatile unsigned char   unuse0; /* unused */
    volatile unsigned char   pmbase; /* (W) msb of p/m base address */
    volatile unsigned char   unuse1; /* unused */
    volatile unsigned char   chbase; /* (W) msb of character set base address */
    volatile unsigned char   wsync;  /* (W) wait for horizontal synchronization */
    volatile unsigned char   vcount; /* (R) vertical line counter */
    volatile unsigned char   penh;   /* (R) light pen horizontal position */
    volatile unsigned char   penv;   /* (R) light pen vertical position */
    volatile unsigned char   nmien;  /* (W) non-maskable interrupt enable */
};
otherwise llvm optimizer may decide that some parts of your program do nothing meaningful, and remove the code that operates on those variables.

Llvm-mos is a huge step in 8-bit development. It provides a 6502 backend for many programming languages, and produces a highly optimized machine code, compared to hand written assembly. If you always wanted to write a game or demo for an Atari or Commodore computer, but never felt like doing all the work in assembly, then now is your chance!

If you're interested in the source code used in this article, you can download it here.