Monday, September 29, 2008

LFE: Lisp Flavoured Erlang

Today Robert Virding released a new version of his Lisp syntax front-end to the Erlang compiler, which he called LFE (Lisp Flavoured Erlang). I saw the previous versions of his work, and I must say that it is the first version that looks really mature to me now.

LFE is not an implementation of Lisp or even Scheme. Its syntax reflects some specific constructs and limitations of Erlang, since it compiles programs to Erlang bytecode and runs them directly on Erlang VM.

Is a new trend for Erlang coming on?

Saturday, September 27, 2008

Cilk: C that scales

Do you like C? I do. I used to write software for embedded devices, where resources are priceless and each CPU cycle and every byte of RAM counts. I enjoyed it a lot, mostly because it was a real challenge and it reminded me of good old 8-bit computer times, when if your program was not fast enough you had to optimize its code or find a smarter solution instead of buying a better CPU, expanding memory, or building a home data center. Even today few people question using C to speed up the most critical parts of software, not to mention some obvious examples like Linux kernel, written almost entirely in ANSI C. Two years ago NVidia announced its CUDA platform, which allows developers to write software in C that can run concurrently on many GPUs, outperforming applications using standard CPUs.

For those who don't own NVidia hardware, there is an alternative called Cilk. Cilk is a language for multithreaded parallel programming based on ANSI C. It extends standard C with new keywords like cilk (identifying parallel function), spawn (which spawns a new computation thread) and sync (which gathers job results from all threads started with spawn).

I ran an example Cilk application that finds Fibonacci numbers of a given value:
#include <cilk-lib.cilkh>
#include <stdlib.h>
#include <stdio.h>

cilk int fib(int n)
{
     if (n < 2)
          return (n);
     else {
          int x, y;
          x = spawn fib(n - 1);
          y = spawn fib(n - 2);
          sync;
          return (x + y);
     }
}

cilk int main(int argc, char *argv[])
{
     int n, result;

     if (argc != 2) {
          fprintf(stderr, "Usage: fib [] <n>\n");
          Cilk_exit(1);
     }
     n = atoi(argv[1]);
     result = spawn fib(n);
     sync;

     printf("Result: %d\n", result);
     return 0;
}
As you can see it looks almost like an ordinary C program. You can run it with --nproc argument to determine the number of CPUs (or CPU cores) that are supposed to be used by the application.

As I expected, the Cilk program turned out to be faster than its Erlang equivalent. The speedup was from 5.110972 to 0.689607 seconds on dual Xeon E5430 2.66 Ghz (8 cores total), comparing to 25.635218 and 10.017358 seconds of Erlang version respectively. But what struck me the most about the results was that the Cilk code scaled in almost linear manner:
Wall-clock running time on 1 processor: 5.110972 s
Wall-clock running time on 2 processors: 2.696646 s
Wall-clock running time on 4 processors: 1.355353 s
Wall-clock running time on 8 processors: 689.607000 ms
Cilk code performance boost was approximately 740%, comparing to Erlang 250%. On 8 cores Cilk code outperformed Erlang code by more than an order of magnitude. This is something that should make Erlang developers feel respect for.

Friday, September 26, 2008

Living in a concurrent world

Erlang is a great platform for writing concurrent, distributed, fault-tolerant applications. But it's not the only one. Some of the popular programming languages also have tools that help developers build concurrent, highly scalable web services.

Java

One of the main sources of Java power is that it has libraries to do almost anything you can imagine. And there are of course libraries that offer various solutions to the scaling out problem, like JMS with JCache or Scala Actors library.
But the real killer application in my opinion is Terracotta. It's an open source clustering solution for Java that works beneath the application level. It uses transactional memory to share objects between applications running on different JVM instances, most notably on different physical machines. Object pool is stored on a master server, which can have multiple passive backups. As a result you receive a clustering solution transparent for the applications you write (you only need to define objects you want to share in Terracotta configuration files). What's more, you can use Terracotta to cluster applications created in languages other than Java, but running on JVM - like JRuby or Scala.

Python

Python programmers can use Disco. It's an open source map-reduce framework written in Erlang that allows you to scale your application easily. Unfortunately, it does not allow neither sharing any data between different application instances (like Terracotta), nor passing messages (like Erlang itself).

Ruby

Ruby looks much better than Python in this aspect. First, there is JRuby, which you can cluster with Terracotta. Second, there is soon-to-come MagLev by Gemstone. It's a Ruby implementation running on a Smalltalk virtual machine, and the first public presentation of MagLev at RailsConf 2008 was quite impressive.

PHP

Not much to say here... You cannot expect much from a language created to produce dynamic HTML pages. You can only rely on external libraries like memcached to store and retrieve objects, but it's by no means an extension to the language itself.

If you know of any other interesting projects providing transparent (or almost transparent) concurrency and distribution to already existing programming languages, you are welcome to tell me about it!

Scripting Erlang

Today I came across an interesting post about new scripting language Reia for Erlang virtual machine. It seems that some people within the Erlang community started to realize what .NET developers always knew and Java programmers understood a few years ago: that you have to distinguish between the language and the runtime. People who were tired with Java limitations or just wanted to introduce their own ideas started to develop things like JRuby, Jython, Groovy, Scala or Clojure, just to name a few. Erlang has already developed a strong position as middleware. Now it's time to make it scriptable.

You can read more about Reia on this blog. It discusses Reia design principles and goals, as well as some controversial ideas, like introducing classes and objects or multiple variable assignment. It is true that it goes up the stream against Erlang philosophy, but I think that there is nothing inherently stupid about any idea until you can prove it wrong. This is why I don't agree with Joe Armstrong's thesis that Object Oriented programming sucks. There are programming languages that succesfully combine OO and functional paradigms (OCaml, Scala, CLOS), which prove just the opposite. It would be nice to see a modern object system in Erlang, and ALGOL based syntax could convince more people to use this excellent platform. Or, at least, possibly alleviate some already exising Erlang pains.

Mnapi for Tokyocabinet

Finally, I found some time to patch Mnapi to work with tcerl. Now you can use it to build your web service with this highly promising Mnesia storage backend.

You can download the new Mnapi here.

Due to the nature of Tokyocabinet, remember to always stop Mnapi via
mnapi:stop().
before closing or shooting Erlang shell, so it could gracefully flush all RAM buffered data to disk before stopping Mnesia. Yeah, I know I could use gen_server behaviour to handle it, but I just cannot convince myself to Erlang behaviourism ;-)

Sunday, September 21, 2008

Sky is the limit for Mnesia

Until recently the biggest Mnesia flaw was its storage limit. It is not the fault of the database system, since Mnesia itself can handle data of virtually infinite size. The main problem lays in an outdated Erlang term storage engine called DETS, which is slow and uses 32-bit offsets (it limits single file size to 2GB). DETS could be a perfect fit for the famous AXD301 ATM switch, but is certainly not for modern web development. A few times I had to drop Mnesia in favour of Hbase to serve data and use Jinterface to make it communicate with Erlang code, which served application logic.

But those times are finally over. Thanks to Joel Reymont and the Dukes of Erl you can now use Tokyocabinet engine as a storage for Mnesia. What is the most exciting about this solution is that it is completely transparent to your application - only creating a table looks a bit different:
Table = testtab,
mnesia:create_table(Table,
    [{type, {external, ordered_set, tcbdbtab}},
     {external_copies, [node()]},
     {user_properties, [{deflate, true}, {bucket_array_size, 10000}]}]).
You also need to start tcerl before running Mnesia:
tcerl:start().
and synchronize table data with disk before closing Erlang, if you don't want to loose some of your data:
Port = mnesia_lib:val({Table, tcbdb_port}),
tcbdbets:sync(Port).
To sync all existing Mnesia tables you can use the following function:
F = fun(T) ->
            case catch mnesia_lib:val({T, tcbdb_port}) of
                {'EXIT', _} ->
                    skip;
                Port ->
                    tcbdbets:sync(Port)
            end
    end,
Tabs = mnesia_lib:val({schema, tables}),
lists:foreach(F, Tabs).
All other common operations, like reading and writing data, transactions, etc., don't need to be changed. You can learn more by looking at example provided with the library.

I managed to make ejabberd (one of the best Jabber/XMPP servers around) to run on Tokyocabinet, only with changing a few lines of its code. Now I'm on my way to make Mnapi work with it. Thank you guys!!!