Sunday, August 19, 2012

Tiny TCP proxy server development continues

Two months ago I published a tiny TCP proxy server which helped me to proxy network traffic on hosts where I didn't have access to firewall configuration. Recently it turned out that it would be nice if it had at least some basic debugging capabilities. Because I'm not a big fan of reinventing the wheel, I decided I would just add an ability to cooperate with external tools through pipes and let them do the job. Thus the latest version of the proxy can pass all incoming and outgoing data through any command which can read it from the standard input and write it back to the standard output. For example if you want to log all incoming http data to in.txt file and outgoing data to out.txt you can use tee command for this purpose. It echoes all input and additionally writes it to a file:
proxy -l 8080 -h 192.168.1.2 -p 80 -i "tee -a in.txt" -o "tee -a out.txt"
The software is LGPL v3 licensed and can be downloaded from Github. On Linux you can compile it with standard development tools, on Windows you will need Cygwin with gcc package installed first.

Thursday, August 16, 2012

Software industry gone mad

Recently I was looking for an alternative to Javascript and a friend of mine recommended Dart, a new programming language announced last year by Google. At first I was very keen to give it a try until I saw a post at Stackoverflow showing the output of Dart to Javascript compiler - it produced 730 lines of Javascript code out of 3-line Dart "Hello world" source.
For me it's just pure madness. Writing code in a scripting language, which than gets translated to a 243 times bigger code in another scripting language, which than gets run by a Javascript engine embedded in a web browser working on top of an operating system... Dart official page strongly emphasizes that it increases programming productivity. Actually I can't figure out how adding more and more layers of software on top of existing ones can increase productivity. And I don't mean logic layers within one application or framework, but using one technology on top of another.
In fact, it's just the opposite. Tracking down non-trivial bugs becomes more problematic, because their source can be located deep below in one of existing layers. Also spawning software layers makes its overall performance drop dramatically. I remember laughing out loud back in 1997 when I read in PC Magazine that a 200 MHz Intel Pentium CPU with 32 MB RAM is recommended to surf the web comfortably, while I could do just fine on Intel Pentium 133 with 16 MB RAM. Now such computer wouldn't probably even boot any modern operating system, not to mention that I can barely open more than a few tabs with Firefox 13 on a computer with 2-core 1.6 GHz CPU and 4 GB RAM. Insane rush to release software more often and more quickly makes it require more and more efficient hardware. In the long rong run, the company may pay less to the developers (because they will produce new applications faster), but it will pay more for hardware infrastructure. Latest power outages in India, one of the world's largest IT centers, clearly show that "productivity" comes at a cost, and that this cost can be huge. To make things worse, hardware performance cannot be increased infinitely: CPU speed became more or less stale since 2005 (further increase in clock frequency became impossible due to amount of heat it produces) and now the processing power is being increased by adding more and more cores. However, there are many algorithms which cannot be parallelized, so they will run equally fast (or slow) on a one-core as well as on a multi-core CPU. Modern processors provide many mechanisms which are designed to help programs run faster, like L1 and L2 cache, but what benefit do you get from a few megabytes of cache if your application requires a virtual machine which does not event fit into the cache itself, not to mention that it needs to be shared among all running applications?
Many of my fellow programmers say that "it has to be like that". No, it hasn't. A computer which helped Apollo 11 land on the Moon was a 2 MHz 16-bit unit, something that modern programmers cannot even imagine. I remember when Kevlin Henney showed at GeeCON a complete chess software written in 672 bytes of 8-bit assembly code someone from the audience screamed: "But that's impossible!". And yet, you can find a web server running on bare Commodore64 with 1 MHz CPU and 64 KB RAM - hell, there is even Unix clone dedicated for it! Another example is a quadcopter controlled by a 16 MHz ATmega 8-bit CPU with 32 KB Flash RAM and 2 KB SRAM (it's based on modified Harvard architecture). If it was programmed by an average software developer, it would probably require the latest Intel Pentium hardware and of course would be much too heavy to fly, even if it run a single floppy MenuetOS...

Monday, July 9, 2012

Streaming Internet radio from Raspberry Pi to Samsung TV

Not much related to programming, but I spent almost a week trying to make it work, so I decided to share my experience in hope someone may find it useful. You've probably heard already about Raspberry Pi - I think it's a great project and I admire thousands of people from all over the world who contribute to its development, but it's still in its experimental stage and has some quirks which sometimes make it very hard to use. One of them is using it as a DLNA server for streaming media straight from the Internet right to your home theater.

I have a Samsung TV with DLNA client on board, which works well with Serviio installed on my laptop, but I didn't feel like turning on a computer just to listen to my favourite online radio station, once I got my hands on Raspberry Pi. So I installed SqueezePlug on an SD card and started to play with it. Unfortunately, it turned out that none of the four pre-installed media servers could provide me with what I needed: MiniDLNA did not support transcoding of live streams (it needed some patches that were not applied), TVMobili was not recognized by my TV, Twonky said that its license expired before I even managed to configure it, and Logitech Media Server simply didn't work (it had some problems with the GUI - it called an AJAX script which always responded with an empty contents).

The only option I was left with was MediaTomb, which fortunately has a very good documentation and a lot of supportive users across different forums. The first thing I learned was that Samsung needs some special headers to be sent by MediaTomb, otherwise it always responds with "file format not recognized" message. Very informative from Samsung I must say - it took me a lot of time just to discover that it was the response header from DLNA server that was not recognized, not the data itself. Next step was to read the radio stream from the Internet and pass it to the TV. I spent next few evenings analyzing why I can stream media from a laptop and I can't from Raspberry Pi using ffmpeg for transcoding, despite using exactly the same version of MediaTomb and the same configuration file. As it turned out, my TV has issues with streams encoded by ffmpeg on ARM CPU (they differ substantially from those produced by ffmpeg on Intel), and it seems that ffmpeg compilation for ARM is simply broken. If you try ffmpeg on different distributions of Debian for Raspberry Pi you will learn that it either crashes with core dump or "illegal instruction" message or produces mpeg audio streams full of bad bytes, which can be corrected by PC players like vlc or foobar, but apparently not by a TV built-in software. So I tried other transcoders, like mencoder (provided by mplayer), until I realized that I need no transcoding at all - it's enough just to pass the original http stream to the output file just as it is, and wget is actually everything that I needed.

Still, one problem remained. After a few seconds of playing the connection between the TV and MediaTomb was breaking up. It was driving me nuts and it took another few hours to discover that MediaTomb sends "Connection: close" header to the client, while the online radio streams use "Connection: keep-alive", which prevents client from disconnecting when there is no data in the stream for a while. I found out that the solution is to use apropriate buffer size for the radio stream - it should be fairly small so that it never stops serving data even for a while and thus prevents the TV from disconnecting, but not too small because it will make the ARM to choke.

So, if you have a Samsung TV and want to listen to Internet radio stations served by a Raspberry Pi, here is my recommended configuration:

1. Install Debian Squeeze following the instructions on Raspberry Pi download page. Then log in to it and install MediaTomb server:
sudo apt-get update
sudo apt-get install mediatomb
2. Modifiy the /etc/mediatomb/config.xml file: Enable web user interface by modifying <ui> entry in the <server> section:
<ui enabled="yes" show-tooltips="yes">
  <accounts enabled="yes" session-timeout="300">
    <account user="mediatomb" password="mediatomb"/>
  </accounts>
</ui>
Add the following entry to the <server> section (as described here):
<custom-http-headers>
   <add header="transferMode.dlna.org: Streaming"/>
   <add header="contentFeatures.dlna.org: 
DLNA.ORG_OP=01;DLNA.ORG_CI=0;DLNA.ORG_FLAGS=017000 
00000000000000000000000000"/>
</custom-http-headers>
Change:
<transcoding enabled="no">
to:
<transcoding enabled="yes">
Just below it, in <mimetype-profile-mappings> section add the following entry:
<transcode mimetype="audio/x-shoutcast" using="streaming"/>
Also, add the following section to <profiles>:
<profile name="streaming" enabled="yes" type="external">
  <mimetype>audio/mpeg</mimetype>
  <accept-url>yes</accept-url>
  <first-resource>yes</first-resource>
  <use-chunked-encoding>no</use-chunked-encoding>
  <agent command="wget" arguments="%in -O %out"/>
  <buffer size="8192" chunk-size="1024" fill-size="2048"/>
</profile>
3. Restart MediaTomb server:
sudo service mediatomb restart
Now log in to MediaTomb interface using a web browser (it will be listening at port 49152, so assuming that your Raspberry Pi has IP address 192.168.1.10 you should go to http://192.168.1.10:49152/) with user and password defined in beforementioned <accounts> section (in my example it's mediatomb/mediatomb) and add a new stream:
Type: External Link (URL)
Title: Your radio station title
URL: Radio URL (for example http://nl2.ah.fm:9000)
Protocol: http-get
Class: object.item.audioItem
Mimetype: audio/x-shoutcast
Choose DLNA (MediaTomb) as input source on your TV and enjoy listening to your favourite radio.

Wednesday, June 27, 2012

Short introduction to TDD

Test-driven development (or TDD in short) is a coding technique that makes creating unit tests a crucial element of the software development process. Using TDD is benefitial in several ways: research by IBM showed that it substantially decreases the ratio of bugs by single line of code (up to 40%) and positively affects code design, leading for example to lower cyclomatic complexity. Also, very high code coverage with unit tests, implied by TDD, allows safe code refactoring and effectively eliminates "do not touch a working code" syndrome. A great book to learn TDD is Kent Beck's "Test Driven Development: By Example".

One of the main principles of TDD is a Red-Green-Refactor cycle. "Red" is a phase when you write a test that checks your code for a desired condition, but it obviously fails, because no code is ready, yet. Then you move to the "green" phase, which is a state when your code passes the test. The important thing about this step is to move from red to green as quickly as possible - the code doesn't need to work, it just needs to pass the test. Now you can start the last phase: refactoring. In this step you can implement the full functionality, restructure your code or optimize it for performance, because the unit test protects you from bugs during code modifications.

Now let me show you how TDD works on a short example of Project Euler problem number one:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

The first sentence of the task is already a test case, so we are ready to dive into a red phase. Let's write a C++ code skeleton with a simple assertion: for the input value of 10 the expected result should be 23.
#include <cassert>

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    return 0;
}

Obviously, the code won't even compile, because there is no class Problem, yet.

Important note: You should write your tests while you implement your solution, not before you start writing code or before you design it. This is a very common misunderstanding about TDD: many people think that TDD enforces you to create tests before you write any code, which is plain wrong. In this example I just assume I need a class I call Problem with one public method Solve() and only then I write a test to verify this method's output. And, in fact, this is the only test I need, because I only want the final solution to be correct.

Now the crucial step is to move to the green phase as fast as possible, so we can make sure that the test passes. So let's create the required class:
class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    return 23;
}

This is so called "fake it" solution, which means that you put some dummy code in your class just to make it pass the test. Now your program should compile and the assertion should not fail. You are ready for the next step: refactoring. Let's implement some obvious solution to Problem 1:
int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

The modified Solve method still passes the test, which means that we can assume that the code above is correct. Now you can modifiy the main method to get the result for input value of 1000 and the final code should look more or less like this:
#include <cassert>
#include <iostream>

using namespace std;

class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    cout << p.Solve(1000) << endl;
    return 0;
}

After solving each Project Euler problem you get access to the forum, where you can find solutions provided by other people. Sometimes you can also get access to a paper describing the most efficient algorithms used to find the correct answer. The solutions are different, but they all should pass the test assertion and give the same result. If you already have the correct solution you can write the second assertion for p.Solve(1000) and try to implement different algorithms to make your code smaller or faster - the TDD approach will prevent you from implementing an incorrect solution.

On the other hand, no unit test will make you sure that your code will work correctly for each possible case. For such purposes much more suitable is design by contract approach, which allows you to define constraints imposed on software modules.
Another interesting technique is fuzz testing, which randomizes input data and monitors the code for unexpected behaviour. However, unlike unit tests, fuzz tests are usually more complicated and not always necessary - they are used mainly for security tests or early identification of problems such as crashes or memory leaks.

Tuesday, June 12, 2012

Facebook Folly

A few days ago Facebook announced on its Engineering page that they decided to release as open source some of the C++ components propeling the most heavily loaded parts of their software infrastructure. This set of components is called Folly and can be downloaded from Github.

Folly code is mainly optimized for performance, so you can even find there Facebook's own implementations of such core C++ classes like string, vector and hashmap. All code is well documented and covered by unit tests.

However, there are some limitations to using Folly you have to remember about. First, the only supported platform so far is 64-bit Linux, and there are no plans of porting it to other platforms, yet. Second, Folly requires gcc compiler to work in C++ 11 mode, which means that as for today you need to provide -std=c++0x compilation flag in your Makefiles and make sure that your code complies with the new C++ standard.

Do you need Facebook Folly? In my opinion in most cases you don't. Remember that Facebook hires the smartest engineers they can find, so if they optimize native C++ libraries it means that they probably have no choice. For an average programmer it would be much more sensible to profile his or her code to identify and remove bottlenecks like suboptimal algorithms or overused I/O (too many disk operations or frequent memory allocations) and treat Folly as the last resort.

Monday, June 11, 2012

Tiny TCP proxy server

Recently I needed to set up a small transparent TCP proxy to forward requests from a specified port on a public host to another host working behind nat firewall in a local network. The task seems to be easy once have root privileges to set up required iptables rules. But even than, you need to be familiar with things like prerouting, postrouting, source nat and destination nat - otherwise packet forwarding will not work as expected. I needed something to let me set up port forwarding easily, without root access and without remembering all necessary firewall rules. I also wanted it to work seemlessly on my Linksys router with Tomato firmware. So I created a small utility in ANSI C, which can be compiled for Linux with standard gcc, for Windows with Cygwin, and for Openwrt/Tomato firmware with appropriate toolchain. In case you need such utility you can find it on Github.

Monday, May 28, 2012

The power of Software Design Patterns

The first day of GeeCON 2012 conference featured a presentation held by dr Ivar Jacobson. His speech called "Liberating the Essence from the Burden of the Whole: A Renaissance in Lean Thinking" made a huge impression on the audience, as it summarized in under an hour his enormous live-long experience in IT industry. It has been a great lesson to anybody who was lucky to attend it.

There was one point, however, I couldn't agree with. Dr Jacobson stated that despite years of its existance software design still lacks universal, solid foundations. Working as a coding architect I have seen a lot of really bad designed and ugly code, including my own. What's more, the abundance of different programming languages and coding styles seems to make things even worse.

There is no single solution that addresses all possible software design problems, but there are some that help solving a great number of those issues. Those universal solutions are called Software Design Patterns. The power of design patterns comes from their pragmatic nature - they are not another product of a software company, university or technical committee, but they are a collection of best practices used by the most skilled programmers to solve the most common programming challenges. The greatest value of software design patterns is that they are universal. It's a common mistake, made even by experienced programmers, to think that design patterns are related only to object oriented programming. They can be applied to any programming language and paradigm. Let me show you how it works.

One of the simplest and yet most powerful design patterns is the state pattern. It allows to easily implement even a very complicated state machine. In this pattern, each state is a separate entity (can be an object), and it is the current state which decides which state comes as the next step of the program. The main loop just holds a reference to the current state and requests it to take certain actions.

State pattern can be used for example to implement a simple ping-pong machine. In this scenario there are two states: ping and pong. Both implement only two functionalities - displaying "ping" or "pong" on the terminal and moving to the next step: in case of ping the next step is pong, and vice versa.

Below is a C++ implementation of a ping-pong state machine. Ping and pong are simple objects that implement two functions: show() and next(). The main program starts with the ping object and than starts to call both methods in a loop. The most crucial element is that the algorithm uses only one reference pointing to the current state. When a transition to the next step takes place, the reference to the previous state is replaced by the new one. This way the main program doesn't need to care about the logic of the state machine, the states take care about it themselves:
#include <iostream>

class State {
public:
    virtual void show() = 0;
    virtual State *next() = 0;
};

class StatePing: public State {
public:
    void show();
    State *next();
};

class StatePong: public State {
public:
    void show();
    State *next();
};

void StatePing::show() {
    std::cout << "ping" << std::endl;
}

void StatePong::show() {
    std::cout << "pong" << std::endl;
}

State *StatePing::next() {
    return new StatePong();
}

State *StatePong::next() {
    return new StatePing();
}

int main(int argc, char *argv[]) {
    State *st = new StatePing();
    for (int i = 0; i < 10; i++) {
        st->show();
        st = st->next();
    }
    return 0;
}
Now you may wonder how can you make use of the state pattern using a functional programming language like, let's say, Erlang? Well, as I sad before, design patterns are universal, so you can use actors instead of objects. Here is Erlang implementation of the state pattern:
-module(states).
-export([state/1, ping/0, pong/0, start/0]).

state(Pid) ->
    receive
        show ->
            Pid ! show,
            state(Pid);
        next ->
            Pid ! {next, self()},
            state(Pid);
        {new, From} ->
            state(From)
    end.

ping() ->
    receive
        show ->
            io:format("ping~n"),
            ping();
        {next, From} ->
            Pid = spawn(states, pong, []),
            From ! {new, Pid}
    end.

pong() ->
    receive
        show ->
            io:format("pong~n"),
            pong();
        {next, From} ->
            Pid = spawn(states, ping, []),
            From ! {new, Pid}
    end.

start() ->
    S1 = spawn(states, ping, []),
    spawn(states, state, [S1]).
"Ping", "pong" and "state" are actors. "State" actor acts as a current state reference - it holds the pid (process id) of the current actor. When you send message "next" to "state" it passes it to the current actor, which spawns a new actor and sends its pid back to "state". This way "state" always holds a pid of the current state represented by actor.
You can load this code into your erlang shell and use observe how it works:
S = states:start().
S ! show.
S ! next.
S ! show.
How about languages that don't support object oriented programming or actors? Have a look at this Scheme example:
(define ping
  (lambda (arg)
    (cond ((equal? arg "show") (display "ping\n") ping)
          ((equal? arg "next") pong))))

(define pong
  (lambda (arg)
    (cond ((equal? arg "show") (display "pong\n") pong)
          ((equal? arg "next") ping))))

(define state
  (let ((current-state ping))
    (lambda (arg)
      (set! current-state (current-state arg)))))

(define repeat
  (lambda (n)
    (let loop ((i 0))
      (cond ((< i n)
             (state "show")
             (state "next")
             (loop (+ i 1)))))))

(repeat 10)
In this example "ping" and "pong" are first class functions which can be passed as arguments to other expressions. Reference to ping or pong is stored in the "state" function, which acts as a closure. It means that the value of the current-state variable is being retained between subsequent calls of the "state" function.

But what if you are a real hardcore hacker who despises high level programming languages and codes only in assembly? Or maybe there are some cases when you may have to? Well, than I have somehing for you too - here is a short state machine written for 64-bit Linux in Intel x86 assembly using The Netwide Assembler:
global  _start

section .data
    ping_msg:  db  'ping', 0x0A
    ping_len:  equ $-ping_msg
    pong_msg:  db  'pong', 0x0A
    pong_len:  equ $-pong_msg

section .text

_start:
    ; init first state
    mov     rsi, _ping_show
    mov     rdi, _ping_next
    ; repeat 10 times
    mov     rcx, 10
_loop:
    push    rcx
    call    rsi            ; show
    call    rdi            ; next
    pop     rcx
    loop    _loop
    ; finish
    mov     rax, 60         ; sys_exit
    mov     rdi, 0          ; exit code
    syscall

_ping_show:
    mov     rax, 1          ; sys_write
    mov     rdi, 1          ; stdout
    mov     rsi, ping_msg
    mov     rdx, ping_len
    syscall
    mov     rsi, _ping_show
    mov     rdi, _ping_next
    ret

_ping_next:
    mov     rsi, _pong_show
    mov     rdi, _pong_next
    ret

_pong_show:
    mov     rax, 1          ; sys_write
    mov     rdi, 1          ; stdout
    mov     rsi, pong_msg
    mov     rdx, pong_len
    syscall
    mov     rsi, _pong_show
    mov     rdi, _pong_next
    ret

_pong_next:
    mov     rsi, _ping_show
    mov     rdi, _ping_next
    ret
To compile the assembly example in Linux use the following commands:
nasm -f elf64 -o states.o states.asm
ld -o states states.o
In this case index registers rsi and rdi are used to keep references to the current state. The main loop just calls "_show" and "_next" subsequently and the "ping" and "pong" states take responsibility for passing valid pointers to the CPU registers.

As you can see, you can use design patterns to properly implement soultions to some universal software desing problems. You only need to be aware of the strength and possibilities of your current programming language and use them wisely.

Sunday, April 22, 2012

AmigaE for PC

Some old computers are like cult cars - they have souls. Those computers were designed with passion by the greatest visionaries of their times, like Jay Miner or Steve Wozniak. Although Apple is the only brand which has successfully resisted the Wintel domination on the personal computer market (partly by adopting x86 architecture) there are still many devoted fans of other architectures, who not only cherish the memory of their favourite computers by writing new software and organising demoparties, but also by building entirely new machines. Amiga community stands out strongly in this area: you can buy not only hardware like AmigaOne (Sam440 / Sam460) or Natami, but also a new version of AmigaOS called AmigaOS 4. PC users can download AROS, which is an operating system designed to be as compatible as possible with original AmigaOS. I used it for a while on Acer Aspire One Z95 and was very impressed - it was not only blazing fast (booting in less than 8 seconds), but also allowed me to connect to a WiFi network and browse web sites.

The most popular programming languages for Amiga (except assembly and C) were Amos and AmigaE. Amos inspired some PC game programming systems like PlayBasic, DarkBASIC or sdlBasic, but AmigaE successors were for many years available only for MC680x0 CPUs. Now you can get PortablE, which is an improved recreation of AmigaE, and use it to write AmigaE programs on Windows. The only requirement is that you need MinGW installed (I use easy to install TDM-GCC bundle) and cannot use some multimedia libraries, which have not been ported to x86 architecture. However, all shell examples compile and run smoothly, so if you are an Amiga fan (like me) and want to have some fun, you can go back for a moment to writing software in AmigaE again.

Wednesday, April 4, 2012

Mongoose - small embeddable web server

David Bolton is a software developer and blogger I respect very much not only for his programming skills, but also for his willingness to share his knowledge with others. He runs the best blog dedicated to programming in all falvours of C (ANSI C / C++ / C#) I have seen so far. You can find there a lot of tips, tutorials, quizes and sometimes description of interesting and yet less known software.

Recently David mentioned an interesting project called Mongoose, a small cross-platform webserver written in C. On the contrary to another beforementioned C-based web server G-WAN, which is not embeddable, closed source, and supports only Linux (Windows version can be downloaded, but is not supported), Mongoose is fully embeddable, open source and runs on Linux, Windows, MacOS X and Android. Currently Mongoose provides bindings to Python and C# as well as support for CGI scripts (including PHP), and as addition to this it can be easily embedded into your C/C++ application. Of course, if you want to retain flexibility and run C servlets along with other scripts, you can still use Tiny C Compiler through CGI. Mongoose requires no installation and can be run with one click (on Windows) or command (on Linux), so it can be used for example to easily share some files on the local network via web interface.

By the way, there is another popular project with the same name. It is a Node.js library for MongoDB.

Thursday, March 29, 2012

GeeCON 2012 registration is open!

If you would like to attend one of the biggest Java conferences in Europe and hear some renowned speakers like Adam Bien, Isabel Drost, Bruce Eckel, Ivar Jacobson or Simon Willnauer live, you are welcome to register at http://2012.geecon.org/. It's a great place to share your thoughts and experiences with some most passionate Java experts from all over the world. It would be great to see you on 16th-18th May 2012 in Poznań, which is also one of the host cities of the UEFA 2012 European Football Championship.