Monday, April 18, 2022

Embedding data in the executable with llvm-mos

In my post about creating a SID player for Commodore 64 with llvm-mos I described a method of loading data from disk using assembly code and ROM routines. However, many programs for 8-bit computers are single file executables that hold all necessary data and code inside them. If you want to create a small demo or game that uses a lot of assets which could fit inside 64kB memory, merging them together with code into one file can be a good idea. As an additional benefit to being easier to copy (e.g. to tape), such executables can be easily compressed with tools like Exomizer, making them faster to load. So let's build a single file SID player with llvm-mos.

First, we need to embed a SID tune into a C source code. In my previous post I described briefly the structure of a SID file: a file header, which is 124 bytes long, and binary data, which begin with a two-byte load address. It means that to get the tune itself we need to skip first 126 bytes of the original file:
dd if=ginger.sid of=ginger.bin bs=1 skip=126
Then we must transform it into some form of a C data structure. I used a simple program called bin2c for this purpose:
bin2c ginger.bin sidplaymem.c sid
which produced a file sidplaymem.c containing a nice char array:
const char sid[2630] = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};
Now we have to instruct the C toolchain where the data should be put in memory. As you probably remember, a SID tune is in fact a machine code. Llvm C compiler, clang, also produces a machine code and puts it into specific segments (.text for code, .data for global variables, .bss for uninitialized data, etc). Then a linker puts the segments together, adds additional data (like external libraries, symbol table, executable header, and so on) and creates an executable file. It also decides where the segments will be placed in memory when the program is loaded from disk. So in order to embed our own data in the final executable, we must tell the compiler to put our data into its own segment, and then instruct the linker that the data must be put into a specific memory location.

Let's start with creating our own code segment ".sid". With llvm we can use a special variable attribute called section, so that:
const char sid[2630]
becomes:
char sid[2504] __attribute__ ((section (".sid")))
Because clang is an optimizng compiler we also need to add the volatile keyword and make sure that the sid array is somehow referenced in the code, otherwise it will be removed during compilation. GCC has attribute called ((used)) which tells the compiler that a variable should not be removed even if there is no reference to it in the code, but with llvm it's more difficult. Clang does not provide such construct so we must force it into thinking that the variable is in use. We can do it either by writing some meaningful code (like displaying some tune info on the screen) or by using it as an entry parameter to an assembly function, which by definition is never optimized. I used the second option, because I already had a function which initializes the SID tune:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000");
So I added a register variable x containing the first byte of the sid array as an input operand:
asm("LDA #$00\n\t"
    "TAX\n\t"
    "TAY\n\t"
    "JSR $1000" :: "x"(sid[0]));
Because the compiler does not analyze the contents of the asm() function and just forwards it further to the assembler as a plain text, it doesn't know that register X is being overwritten by instruction TAX and initializing it with sid[0] does nothing, so it leaves the sid array alone. A complete source code of the player now looks like this (I removed most of the SID data for clarity):
#include <string.h>

#define rasterline (*((volatile unsigned char*)0xD012))

volatile char sid[2504] __attribute__ ((section (".sid"))) = {
	0x4c, 0xd4, 0x14, 0xea, 0xea, 0xea, 0x20, 0x7a, 0x14, 0xce, 0x3d,
	0x03, 0x10, 0x06, 0xad, 0x3a, 0x03, 0x8d, 0x3d, 0x03, 0xa2, 0x02,
    ...
	0x20, 0x45, 0x59, 0x45, 0x27, 0x39, 0x37
};

int main() {
    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000" :: "x"(sid[0]));

    // Call SID refresh routine
    while (1) {
        if (rasterline == 255) {
            asm ("JSR $1003");
        }
    }

    return 0;
}
Now it's time to tell the linker what memory layout we want to use. Because the tune I've used has a load address $1000, we tell the linker to put section ".sid" into that address:
mos-c64-clang -Os -Wl,--section-start=.sid=0x1000 -o sidplaymem.prg sidplaymem.c
You may notice that the resulting executable sidplaymem.c is over 4.5kB, which is more that the SID data and the player code combined. It's because the Commodore 64 executable format is very primitive: it consists of a two-byte load address and a single block of data which starts with that address. Because the default load address for C64 programs is $0801, all free space between this address and $1000 is just filled with zeros to allow proper data alignment. But if you compress it with Exomizer you will get a file which is only 2kB:
exomizer sfx basic -o crunched.prg sidplaymem.prg
As with original llvm-mos sid player, you can find the complete source code on Github.

Tuesday, April 12, 2022

Using a cheap oscilloscope to decode serial communication

Serial interface is one of the oldest and simplest communication methods used in computing. Back in the day it was used to connect a terminal to a mainframe, and it's still a great way of connecting computers that have no common way of talking to each other, like Commodore 64 and PC or TRS-80 and Amiga. Nowadays serial interfaces are widely used to communicate with microcontrollers. If you ever played around with Arduino or alike, and tried to do more advanced projects than blinking LEDs, you probably talked to a microcontroller via some sort of serial connection.

But no matter if you just used some wires, a null modem cable, or an USB, you probably ran into a situation where the communication didn't work. It's hard to find the source of the problem if you don't even know if it's in hardware or in software. The best way to determine if the hardware part is working correctly is to use an oscilloscope with a built-in serial decoder. But such oscilloscopes are very expensive and if you are not a professional electronics technician chances are that you don't own one. Or maybe you only have a basic model, without a serial decoder? Don't worry, it will do. I have been using a simple DSO 150 clone with success, and I paid for it less than 30 dollars.

Software setup

To demonstrate how to use a digital oscilloscope to decode serial communication we will use a very basic setup, consisting of an Arduino Uno, a couple of wires, and a simple program. Let's start with the program first:
void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.print("a");
  delay(1000);
}
Use Arduino IDE to compile and upload the program. Now you can open a serial monitor in the tools menu to verify if it works. You should see a letter "a" printed out to the screen once a second - it means that the transmission is working correctly.

Hardware setup

However, your computer is not connected to the serial line of the Atmega328 microcontroller directly. The signals are being translated by a separate chip and sent through USB. But you can still hook up to the serial interface itself, using only three wires. Look for pins RX, TX and GND on the Arduino board. RX stands for "receive" and TX for "transmit". The voltage level on those pins depends on the board, but in case of Arduino Uno it's 5V. Because you have current flowing through those pins, you also need a ground pin (GND). If you have a serial to usb converter, you can connect the Arduino to the converter using the following wiring:
RX  <---> TX
TX  <---> RX
GND <---> GND
It's the simplest serial cable you can build. There are other pins as well, but they are not necessary for the connection to work. Now you can use a terminal program connected to the serial port used by the converter to watch the transmission. When you power up the Arduino (through USB or PSU) you should see the same result as you've seen with the serial monitor: letter "a" printed out once a second.

Connecting oscilloscope

To debug a serial connection you need to use an oscilloscope. Connect a probe to the TX pin, and a ground to the GND pin:
Now set the vertical resolution to 2V and the horizontal one to 0.2ms. Switch the oscilloscope to single mode, so that it holds a waveform when it detects data on the transmission pin. After approximately one second, you should see the following picture:
Decoding serial data

After you managed to capture the data, it's time to decode it. As you can see, the default serial line state is high (logical 1), and a single pulse (which represents one bit) takes around one half of a square field on the oscilloscope screen. Standard parameters for the serial transmission are 8N1, which means that one data packet consists of 10 bits: a start bit (which is always 0), 8 data bits (from lowest to highest), and a stop bit (which is always 1). The bit pattern shown by the oscilloscope should now be clear:
When you remove the start and the stop bits, and rearrange the data bits from highest to lowest, you are left with binary number 01100001, which is decimal 97. Now take a look at the ASCII table and search for code 97. It stands for the letter "a".

Summary

Debugging serial communication can be a tedious task, although sometimes necessary. But, as it turns out, it can be done with very simple tools, without an expensive serial decoder.

Friday, April 1, 2022

llvm-mos SID player for Commodore 64

In my previous post I described llvm-mos, a new llvm backend which can generate code for 6502 CPU, and I presented a simple program for an 8-bit Atari. Today I want to show you how you can use assembly code in your C program by using a more complicated example. This time it will be a music player for Commodore 64.

The most popular C64 chiptune format is SID. It's basically a 6502 machine code which consists of two main routines: an init routine which sets up the sound chip, and a player routine which must be called once per screen frame. The biggest resource of SID tunes is the High Voltage SID Collection. There are many players which can play SID music files on modern devices, like ZXTune or VLC.

Preparing a SID file

In order to play a SID tune on a real Commodore 64 we must determine where it should be located in memory, and what are the addresses of the init and the player routines. This information can be found in the SID file header. Bytes 8 and 9 tell if the the data are in original C64 binary file format, i.e. the first two bytes of the data contain the little-endian load address (low byte, high byte), bytes 10 and 11 hold the init address, and bytes 12 and 13 contain the play address. Each tune can use different addresses, so it's necessary to check them carefully. For this project I'll be using one of my favourite C64 tunes, "Ginger" by Kristian Røstøen. When you analyze the header of this particular tune you'll notice that the init address is $1000, the play address is $1003, and the binary data, which begin at offset 124, already contain the load address $1000. We can use this information to extract the machine code from the SID file:
dd if=ginger.sid of=ginger.prg bs=1 skip=124
The resulting file ginger.prg can be put on a floppy disk and loaded on a Commodore 64 with the following command:
LOAD "GINGER",8,1

Loading a SID file

Instead of loading the file into memory by hand, we can write some code to do it for us. Unfortunately, llvm-mos does not have any library functions to do it at the moment, but fortunately Commodore 64 ROM does. The easiest way to use the ROM functions is through assembly, so let's try to write some assembly code in llvm:
char *fname = "GINGER";
unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
unsigned char fnamehi = (unsigned int)fname >> 8;
unsigned char fnamelen = strlen(fname);

asm("JSR $FFBD" :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));
The C part should be pretty much self-explanatory. We create variables which contain the lower and the higher byte of the file name pointer, and the file name length. You may think now that using strlen() function is an overkill. Well, in cc65 it probably is, but llvm-mos optimizer can translate the whole expression into a single CPU instruction:
LDA #$06
The assembly part of the code needs more explanation. Here we call SETNAM function located in ROM at address $FFBD. Before we call it, we need to put an address of the string containing the file name in two 8-bit CPU registers, X and Y, and the string length in register A. A pure assembly code would look like this:
LDA fnamelen
LDX fnamelo
LDY fnamehi
JSR $FFBD
However, in llvm-mos we cannot use C expressions in assembly code directly. But we can use so called input and output operands to transfer C variables to CPU registers and vice versa. In gcc syntax for inline assembly, which llvm also uses, operands are separated from assembly expresssions by colons. You can read more about using inline assembly and operands here.

After setting up the file name we need to provide some information about the device from which data should be loaded, and whether it should be loaded to an address pointed by the file header:
asm("LDA #$01\n\t"     // logical file number
    "LDX #$08\n\t"     // device number
    "LDY #$01\n\t"     // load to address found in file header
    "JSR $FFBA\n\t");  // SETLFS
Remember the LOAD "GINGER",8,1 command? This is the ",8,1" part, but written in assembly.

Now all that's left is to load the file into memory:
asm ("LDA #$00\n\t"     // load to memory
     "JSR $FFD5\n\t");  // LOAD

Playing a SID tune

It's time to play the music, then. Let's start with the init routine:
asm ("LDA #$00\n\t"
     "TAX\n\t"
     "TAY\n\t"
     "JSR $1000");
Now we have two ways of calling the play routine. The first one is fairly simple. We need to wait in a loop until the raster reaches a certain line on the screen, and when it happens, call the play routine. The line number can be any number from 0 to 263 (for NTSC machines) or 312 (for PAL machines), but because the raster line register $D012 is 8-bit, it means that it can only hold values from 0 to 255 and that values from 0 to 8 (NTSC) and 0 to 56 (PAL) appear in $D012 more than once per frame. A solution to this problem is to check the 7th bit of register $D011, which is 0 for lines from 0 to 255 and 1 for lines > 255, or use values bigger than 56, which never appear in register $D012 twice during one frame. In my example, I'm using 255:
#define rasterline (*((volatile unsigned char*)0xD012))

while (1) {
    if (rasterline == 255) {
        asm ("JSR $1003");
    }
}
Putting all the code snippets together, a complete program looks like this:
#include <string.h>

#define rasterline (*((volatile unsigned char*)0xD012))

int main() {
    char *fname = "GINGER";
    unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
    unsigned char fnamehi = (unsigned int)fname >> 8;
    unsigned char fnamelen = strlen(fname);

    // Load SID file into memory
    asm("JSR $FFBD\n\t"  // SETNAM (A = fname lenght, XY = fname address)
        "LDA #$01\n\t"   // logical file number
        "LDX #$08\n\t"   // device number
        "LDY #$01\n\t"   // load to address found in file header
        "JSR $FFBA\n\t"  // SETLFS
        "LDA #$00\n\t"   // load to memory
        "JSR $FFD5\n\t"  // LOAD
        :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));

    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000");

    // Call SID refresh routine
    while (1) {
        if (rasterline == 255) {
            asm ("JSR $1003");
        }
    }

    return 0;
}

Playing a SID tune using interrupts

The second way of calling the play routine is to use raster interrupts. The principle is the same, we call the play routine every frame, but this time we don't have to wait in a loop for a given raster line. We only need to write a short function which will be called when an interrupt occurs, and tell the computer to generate the interrupt at a certain raster line. The addresses of the interrupt routines are stored at the beginning of the memory in a structure called vector table. We need to modify this table so that a raster interrupt vector now points to our function instead of the original one.

Let's write the function first:
void play() {
    asm ("ASL $D019\n\t"  // acknowledge interrupt (reset bit 0)
         "JSR $1003\n\t"  // call SID refresh routine
         "JMP $EA31");    // jump to default interrupt handler routine
}
When writing a raster interrupt function, we need to do two additional things. First, we need to let the computer know that the interrupt was handled properly. This is done by reading register $D019, zeroing it's first byte, and writing back to it. The easiest and fastest way to do it is by using Arithmetic Shift Left (ASL) instruction, which does exactly that. Second, because we hijacked the original interrupt vector, we want to call the original ROM routine when we finish executing our own code.

Now it's time to modify the system interrupts:
#define intctrl (*((volatile unsigned char*)0xDC0D))
#define screenctrl (*((volatile unsigned char*)0xD011))
#define rasterline (*((volatile unsigned char*)0xD012))
#define rasterintctrl (*((volatile unsigned char*)0xD01A))
#define rasterintlo (*((volatile unsigned char*)0x314))
#define rasterinthi (*((volatile unsigned char*)0x315))

void (* fun)(void) = &play;
unsigned char funlo = (unsigned int)fun & 0xFFFF;
unsigned char funhi = (unsigned int)fun >> 8;

asm("SEI");           // switch off interrupts
intctrl = 0x7F;       // disable CIA interrupts
rasterintctrl = 1;    // enable raster interrupts
rasterintlo = funlo;  // low byte of raster interrupt routine
rasterinthi = funhi;  // high byte of raster interrupt routine
rasterline = 127;     // trigger interrupt at raster line 127
screenctrl = screenctrl & 0x7F;
asm("CLI");           // switch on interrupts
First, we need to disable all interrupts with a CPU instruction SEI. We have to do it because there is a chance that an interrupt occurs after we modified some registers, but before we modified others, and it will cause the CPU to jump to a wrong place in memory and crash. Next, we ask the system to disable clock interrupts by zeroing the highest bit of $DC0D, and enable screen raster interrupts instead. Then we change addresses $314 and $315 in the vector table to point to the play() function. Finally, we indicate a raster line which will trigger the interrupt. After we finish modifying the system interrupts we can re-enable them with CLI.

Remember that rasterline register can hold only 8 bits, so you also need to set the highest bit of screenctrl register accordingly. For example, to generate an interrupt when line number 260 is drawn on the screen, you need to do the following:
screenctrl = screenctrl | 0x80;  // last bit equals 1 for lines > 255
rasterline = 4;                  // 260 - 256 = 4
You might look at it as if the rasterline register was 9-bit, with address $D012 holding the lowest 8 bits, and $D011 holding the highest bit in its own highest bit.

Now let's get back to our example. A complete program now looks like this:
#include <string.h>

#define intctrl (*((volatile unsigned char*)0xDC0D))
#define screenctrl (*((volatile unsigned char*)0xD011))
#define rasterline (*((volatile unsigned char*)0xD012))
#define rasterintctrl (*((volatile unsigned char*)0xD01A))
#define rasterintlo (*((volatile unsigned char*)0x314))
#define rasterinthi (*((volatile unsigned char*)0x315))

// SID refresh function
void play() {
    asm("ASL $D019\n\t"  // acknowledge interrupt (reset bit 0)
        "JSR $1003\n\t"  // call SID refresh routine
        "JMP $EA31");    // jump to default interrupt handler routine
}

int main() {
    char *fname = "GINGER";
    unsigned char fnamelo = (unsigned int)fname & 0xFFFF;
    unsigned char fnamehi = (unsigned int)fname >> 8;
    unsigned char fnamelen = strlen(fname);
    void (* fun)(void) = &play;
    unsigned char funlo = (unsigned int)fun & 0xFFFF;
    unsigned char funhi = (unsigned int)fun >> 8;

    // Load SID file into memory
    asm("JSR $FFBD\n\t"  // SETNAM (A = fname lenght, XY = fname address)
        "LDA #$01\n\t"   // logical file number
        "LDX #$08\n\t"   // device number
        "LDY #$01\n\t"   // load to address found in file header
        "JSR $FFBA\n\t"  // SETLFS
        "LDA #$00\n\t"   // load to memory
        "JSR $FFD5\n\t"  // LOAD
        :: "a"(fnamelen), "x"(fnamelo), "y"(fnamehi));

    // Init SID player routine
    asm("LDA #$00\n\t"
        "TAX\n\t"
        "TAY\n\t"
        "JSR $1000");

    // Set up raster interrupt
    asm("SEI");           // switch off interrupts
    intctrl = 0x7F;       // disable CIA interrupts
    rasterintctrl = 1;    // enable raster interrupts
    rasterintlo = funlo;  // low byte of raster interrupt routine
    rasterinthi = funhi;  // high byte of raster interrupt routine
    rasterline = 127;     // trigger interrupt at raster line 127
    screenctrl = screenctrl & 0x7F;
    asm("CLI");           // switch on interrupts

    return 0;
}
It's a bit longer than the first version, but a side effect of using interrupts instead of a loop is that we can still type on the screen and execute simple Basic commands while the music plays in the background.
If you want to try the program yourself, you can find the complete code on Github.

Sunday, March 27, 2022

llvm-mos compiler for 8-bit 6502 machines

A while ago I posted an article about programming for an 8-bit Atari. I wrote a simple program in C, compiled it with cc65, and then optimized the assembly output. The best result I could get with built-in compiler optimizations was 630 bytes, which I then reduced to 29 bytes using manual machine code refactoring and an alternative assembler. It proved that you can program an 8-bit machine in C, but the result is far from perfect and hand-written assembly is still better.

A few months ago everything changed. A project llvm-mos has started and achieved some really impressive results so far. Llvm is a highly optimizing compiler toolchain, which can be used as a front end for any modern programming language. Clang, an llvm frontend for C, C++, and Objective-C, is a default compiler for Mac OS X, where it replaced gcc in 2011. Llvm can also produce machine code for any CPU architecture when given a suitable backend. LLvm-mos provides such backend for 6502 CPU, which means that you can write code for Commodore 64 or Atari 400/800/XL/XE even in Rust!

I decided to compile my original program with llvm-mos and compare it with the latest cc65. The nicest thing was that I didn't have to touch the source code at all, I only had to provide a custom atari.h header, which I simply copied from cc65 include directory, with some minor changes (more on this later). The code looks like this:
#include "atari.h"

#define SDMCTL (*((unsigned char*)0x22F))

int main() {
    unsigned char line, counter = 0;

    SDMCTL = 0; // disable display in BASIC screen area
    while (1) {
        line = ANTIC.vcount;  // read current screen line
        if (!line) counter++; // increase colour shift on new frame
        ANTIC.wsync = line;   // block CPU until vertical sync
        GTIA_WRITE.colbk = line + counter; // change background colour
    }
    return 0;
}
What it does is that it displays a moving rainbow animation on the screen.
When I compiled my program with the latest cc65 version I got a 557 bytes long executable. LLvm-mos produced 58 bytes of code. A hand-written assembly version is 29 bytes long. I didn't do any speed comparisons, but the results I found show that llvm-mos creates machine code which is faster than the code produced by cc65 by a long shot.

Now a word about compatibility between cc65 and llvm-mos. I didn't have many problems so far, and I could even use cc65 include files, but with two caveats. First, you have to get rid of cc65 specific extensions like __fastcall__ or __asm__. Second, you need to remember about using volatile keyword every time you access a variable which is mapped to a memory address or an input / output port. For example, you have to change this:
struct __antic {
    unsigned char   dmactl; /* (W) direct memory access control */
    unsigned char   chactl; /* (W) character mode control */
    unsigned char   dlistl; /* display list pointer low-byte */
    unsigned char   dlisth; /* display list pointer high-byte */
    unsigned char   hscrol; /* (W) horizontal scroll enable */
    unsigned char   vscrol; /* (W) vertical scroll enable */
    unsigned char   unuse0; /* unused */
    unsigned char   pmbase; /* (W) msb of p/m base address */
    unsigned char   unuse1; /* unused */
    unsigned char   chbase; /* (W) msb of character set base address */
    unsigned char   wsync;  /* (W) wait for horizontal synchronization */
    unsigned char   vcount; /* (R) vertical line counter */
    unsigned char   penh;   /* (R) light pen horizontal position */
    unsigned char   penv;   /* (R) light pen vertical position */
    unsigned char   nmien;  /* (W) non-maskable interrupt enable */
};
into this:
struct __antic {
    volatile unsigned char   dmactl; /* (W) direct memory access control */
    volatile unsigned char   chactl; /* (W) character mode control */
    volatile unsigned char   dlistl; /* display list pointer low-byte */
    volatile unsigned char   dlisth; /* display list pointer high-byte */
    volatile unsigned char   hscrol; /* (W) horizontal scroll enable */
    volatile unsigned char   vscrol; /* (W) vertical scroll enable */
    volatile unsigned char   unuse0; /* unused */
    volatile unsigned char   pmbase; /* (W) msb of p/m base address */
    volatile unsigned char   unuse1; /* unused */
    volatile unsigned char   chbase; /* (W) msb of character set base address */
    volatile unsigned char   wsync;  /* (W) wait for horizontal synchronization */
    volatile unsigned char   vcount; /* (R) vertical line counter */
    volatile unsigned char   penh;   /* (R) light pen horizontal position */
    volatile unsigned char   penv;   /* (R) light pen vertical position */
    volatile unsigned char   nmien;  /* (W) non-maskable interrupt enable */
};
otherwise llvm optimizer may decide that some parts of your program do nothing meaningful, and remove the code that operates on those variables.

Llvm-mos is a huge step in 8-bit development. It provides a 6502 backend for many programming languages, and produces a highly optimized machine code, compared to hand written assembly. If you always wanted to write a game or demo for an Atari or Commodore computer, but never felt like doing all the work in assembly, then now is your chance!

If you're interested in the source code used in this article, you can download it here.

Saturday, May 26, 2018

What makes programming language popular

Everybody knows that there is no "best" programming language, just as there is no best car or best kind of cheese. For many years, since I got my first computer, I have been looking for a language that would suit me best and satisfy all my needs. I have one or two favourites, which I use most frequently when creating utilities for my own, so I guess those are my personal best. But in general, programming languages are just tools. They may be better or worse suited for the job, but the language itself is not your aim. What really matters is what you can create with it. When you compare iOS to Android, do you compare Objective C to Java? I know quite a few programmers who feel angry when they hear such opinions, and who spend whole days polishing their code. But, as Jamie Zawinski said: "It’s great to rewrite your code and make it cleaner and by the third time it’ll actually be pretty. But that’s not the point — you’re not here to write code; you’re here to ship products."

Speaking of products. If you have ever wondered what makes a programming language popular, the answer is simple: money. There are many great essays about language popularity (like this one by Paul Graham), but they are true only in theory. In practice, programming languages are products, just like any other software. Let me give you an example.

A few weeks ago a developer published on his company's blog an article describing his team's negative experience with Kotlin. It was just a list of problems they encountered and reasons why they decided not to use the language for their project. Within hours the post reached over a hundred comments. A huge amount of them were posted by one of Kotlin developers, working for JetBrains, a company behind Kotlin. It's quite understandable he tried to defend it, but he didn't stop at providing his arguments. He commented on every post that wasn't positive about Kotlin, even if it wasn't negative about it. It didn't seem a problem for the company that one of its employees spends a whole day writing tons of elaborate comments. Soon, other Kotlin advocates appeared. One of them suggested that the atricle's author is a moron, and then provided a link to his blog, where he boasts about giving talks on Kotlin conferences. Guess who sponsors those conferences.

All modern programming languages, which are popular, have powerful patrons. Swift is backed by Apple, Golang by Google, and Rust by Mozilla. JetBrains didn't create Kotlin just for fun, they admitted on their blog: "we expect Kotlin to drive the sales of IntelliJ IDEA". Social marketing is becoming main driving force for many companies, so they put a lot of effort and money in creating a positive picture of their products and reacting to anything that may damage it. New languages struggle to get attention, and it takes time before they gain wide adoption. Until this happens, they need an army of salesmen and supporters hoping to jump the train (and get a well paid job) when the language reaches its critical mass. So if you are thoughtless enough to criticize the product, you will instantly become a target for aggressive marketing counterstrikes.

About a month later, another post appeared on the same blog. This time another developer described his own way from C# to Java. This time, the post was commented twice. Conclusion? Nobody fights for old languages, because nobody has to. Oracle estimates that Java runs on 3 billion devices - loosing that market, if possible, would take ages. C# is now the primary programming language for Windows, and Microsoft still owns about 90% of desktop market share. ANSI C has been used as a core of all modern operating systems and device drivers. It is also the primary programming language for micro-controllers, for the same reason it was initially developed: weak CPU and low memory (try to run software written in any modern language on a chip with 2kB RAM - and for your information, Forth is not a modern language). PHP is popular because of dirty cheap LAMP hosting, and you can have your neighbour's kid set up a website with Wordpress, or online store with osCommerce, for peanuts.

So, how do you make programming language popular? First, you need a decent design. It doesn't have to be great, it just needs to be solid enough, so you can convince other people that it's worth using. Even COBOL was a decent language for its time. Second, you have to be a good salesman and sell your idea to a company which will agree to support it with its money. For this to happen, your language must generate some kind of profit - it can either introduce some clever language constructs that reduce development time, or produce efficient executables which use less server resources. Kotlin's selling points are null pointer safety and reduced boilerplate code. Your selling point may be, for example, a reduced resource usage in the cloud, so you will pay lower bills for Amazon Web Services, Microsoft Azure or Google Cloud Platform. It will not hurt if you know people who make important decisions within your company.

When all this happens, you have a good chance to succeed. Strong marketing among engineers - conferences, tutorials, free tools - backed by company funds, will help making the language popular. An army of fanboys will prove to everyone who disagrees that they are just too stupid to understand how awesome it is.

Because nowadays programming language is a product. Remember about it next time you pick up a new shiny one.

Sunday, May 20, 2018

Writing Google Cloud Functions in C

Serverless is a new hot topic in the IT world. As with every new technology, the idea was obvious in theory (easy way of migrating software without live migration of whole virtual machines), but very few people knew how to implement it in practice. Today we have AWS Lambda, Google Cloud Functions and Azure Functions, and will probably see many other in the near future.

Google Cloud is a technology I currently work with, and Google Cloud Functions are a very handy tool. If you just need to create a simple RESTful service, you don't have to write a backend application and set up a Kubernetes cluster to run it. All you need to do is to write a piece of code which handles a request and sends a response, and the Cloud Functions will take care of all the dirty work. They free you from the burden of deploying, monitoring, scaling and load balancing your application. You can focus entirely on writing software.

As always, the comfort comes at a price. Currently the only programming language supported by Cloud Functions is JavaScript. Does it mean that you cannot use a library written in another language? If you have a nice piece of C code and want to use it, are you doomed to manually build an infrastructure based on Compute Engine instances with Linux and Apache? Fortunately, the answer is no.

A few years ago in my post about JavaScript I described Emscripten, a C/C++ to JavaScript compiler. It has matured since, and it can produce a very good and stable code. Let me show you an example of how it can be used to create a Cloud Function from an ANSI C code.

Let's start with a simple program to display current time and date:
#include <stdio.h>
#include <time.h>

const char *now() {
  time_t t = time(NULL);
  return ctime(&t);
}

int main() {
  printf("%s\n", now());
  return 0;
}
Now let's convert it to JavaScript using Emscripten:
emcc -O2 -o main.js main.c
and run it with Node.js:
node main.js
To make our program run with Cloud Functions we need to make a separate file (I called it extern.c) containing only now() function:
#include <stdio.h>
#include <time.h>
#include "emscripten.h"

EMSCRIPTEN_KEEPALIVE
const char *now() {
  time_t t = time(NULL);
  return ctime(&t);
}
As you can see, there are some changes to the original source code:
- there is no main() function
- there is additional #include "emscripten.h" directive
- the now() function is prefixed with EMSCRIPTEN_KEEPALIVE. It tells the compiler that the code is in use and should not be removed during compilation (the optimizer tries to identify and remove dead code).

Let's build an external library from our code:
emcc -O1 -s 'EXTRA_EXPORTED_RUNTIME_METHODS=["ccall", "cwrap"]' -o extern.js extern.c
This creates a module extern.js, which can be included into a Node.js application:
var module = require('./extern.js');
var now = module.cwrap('now', 'string', []);

console.log(now());
Module.cwrap returns a function wrapper, which calls our code exported from C. The first argument is the name of the C function, the second one is the type returned by the function, and the third one contains the list of the argument types passed to it.

A word of warning: When using Node.js, you can use ccall instead of cwrap, as described in Emscripten documentation. The difference is that ccall doesn't return a function, but a scalar value, so you will log it with console.log(now) instead of console.log(now()). However, ccall will not work correctly with current version of Google Cloud Functions - module.ccall('now', 'string', []) will always return the time when the Cloud Function was created instead of current one. Also, don't use optimization levels higher than O1 when compiling external modules with Emscripten.

Because Google Cloud Functions are powered by Node.js, it is very easy to convert our Node.js application into a Cloud Function:
var module = require('./extern.js');
var now = module.cwrap('now', 'string', []);

exports.time = (req, res) => {
  res.send(now());
}
Save the code above as index.js and put it into a zip file together with extern.js (you can find an example file here). Now go to Google Cloud Platform panel, choose "Cloud Functions" and click "Create function". Change the default function name to "time". Leave memory allocation settings unchanged. 256MB may seem an overkill, but changing it to a lower value may actually add a performance penalty to your application. Underneath the Cloud Functions there are still virtual instances running, and low-memory virtual machines get less CPU time. Set the trigger type to "HTTP trigger" - it means that the function will be run on demand when an http request comes. Pay attention to the URL field - it's the address you can use to call your function. Choose "ZIP upload" as the source of the code. You will need a Bucket as a temporary upload storage - you can create it through the "Storage" menu in Google Cloud Platform panel. Finally, when asked about function to execute, enter "time":
Creating a function takes a while. When it's ready, you can go to the URL which was displayed when the function was created (for example https://us-central1-myproject.cloudfunctions.net/time) and you should see the current time in response.

You can call the Cloud Functions from a website with AJAX. What I mean is that for example you can create a page displaying a simple clock, which will refresh every second (remember to change the url https://us-central1-myproject.cloudfunctions.net/time to yours):
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> 
<script src="jquery.min.js"></script>
<script>
$(document).ready(function() {
  setInterval(function() {
    $.ajax({
      url: 'https://us-central1-myproject.cloudfunctions.net/time',
      method: 'GET',
      crossDomain: true,
      success: function(data) {
        $('#time').text(data);
      }
    });
  }, 1000);
});
</script>
</head>
<body>
<p id="time"></p>
</body>
</html>
This alone, however, will not work, due to cross-origin protection mechanisms. In short, a web browser will not allow AJAX request from your website to an endpoint located in a different domain, if it does not explicitly allow it. To fix it, you need to add Access-Control-Allow-Origin response headers to index.js:
var module = require('./extern.js');
var now = module.cwrap('now', 'string', []);

exports.time = (req, res) => {
  res.set('Access-Control-Allow-Origin', 'http://www.mypage.com');
  res.set('Access-Control-Allow-Methods', 'GET');
  res.send(now());
}
Replace http://www.mypage.com with the domain name of your website where the clock page is located.

Sunday, April 8, 2018

Using Raspbery Pi 3 as wireless client router

The title may seem confusing, but it covers a specific use case: you have a machine with ethernet port only, and you want to connect it to a wifi network. For whatever reason, you can't, or don't want, use a wireless card with your computer. As for me, I have Amiga 500 with Plipbox, and Rapberry Pi B+ with RISC OS 5, and both can be connected to the Internet - but using a few meters long network cable, lying on the floor, to reach a cable TV router is far from what I'd call a comfortable solution.

The Internet is full of recepies of how to use Raspberry Pi 3 as a wireless access point, but I wanted to use it the other way round. I wanted it to be a wireless client, and at the same time act as a router, so I could connect my ethernet-only hardware to the Internet through it. Sadly, I couldn't find any sensible description, so I finally figured it out myself. It's not an ideal solution, and it uses static addresses instead of DHCP, but it's fairly simple, and it works for me. If you have a better one, feel free to point to it in the comments.
My recipie works with Debian 9 (a.k.a. Raspbian Stretch) with desktop. Take note that it may not work with other Linux versions. I assume that you know how to install it on SD card and how to run it, so I'll go straight to the network setup.

First, you need to configure Raspberry Pi to connect to the wifi network. Turn it on and wait for the system to load. Now click on the wifi icon located on the dock and choose your network. Enter the network password and wait until the connection becomes established. You can check if everything works correctly by pinging google.com or some other address.

Now you need to edit a few system files. They are only writeable by the user "root" so you can either use a console editor, like vim or nano, with the "sudo" command, or run "gksu gedit" from the desktop. I prefer the first method, so I'll use this one.

First, you need to disable dhcp for the ethernet interface. It's because Raspberry Pi is meant to work as a client and so it will try to reconfigure the ethernet port everytime it discovers a network cable has been attached to it. So do:
sudo nano /etc/dhcpcd.conf
and add the following line at the end of the file:
denyinterfaces eth0
Next, you need to enable packet forwarding between the network interfaces. To do this, uncomment the following line in /etc/sysctl.conf:
net.ipv4.ip_forward=1
Finally, you need to configure routing. I use 10.0.0.0/24 for my network, but you can use any non-routable addresses for this. So, add the following lines to /etc/rc.local before "exit 0":
ifconfig eth0 10.0.0.1 netmask 255.255.255.0 up
iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
iptables -A FORWARD -s 10.0.0.0/24 -j ACCEPT
The first line sets up the IP address and netmask for the ethernet port. The second one enables packet translation, which means that the network packets will be able to get from the 10.0.0.0/24 network to the outside world. The last rule allows those packets to pass through the built-in firewall.

Now reboot the Raspberry Pi. After a while it will connect to the wifi network (you can say when it is when the yellow LED on the board stops binking), and it will allow clients to use its ethernet port as a gateway.

I will not describe client configuration in detail, because it differs very much among operating systems. I'll just give you the parameters you need to enter to make the client work:
1) IP address: anything between 10.0.0.2 and 10.0.0.254. If you use more than one device (for example connected through a switch) rememer that each one of them needs its own unique IP address.
2) Netmask: 255.255.255.0
3) Gateway: 10.0.0.1
4) DNS servers: use 8.8.8.8 as primary and 8.8.4.4 as secondary. Alternatively, you can use your home router's IP address (e.g. 192.168.1.1) or your ISP provider's DNS hosts - if you know what they are.