Thursday, November 6, 2008

Distributed processing with JScheme and Terracotta

In my post about clustering JScheme with Terracotta I presented how to share Java objects through Terracotta using JScheme. But clustering is not only about sharing data. It is also about parallel (and possibly distributed) task processing.

One of the fundamental assumptions of Scheme (and other Lisps) is that code is data. Programs are just collections of evaluable expressions and thus you can represent data as code, and vice versa. This is exactly what you need when you want to distribute code over a system designed primarily to share data.

I prepared a simple system based on JScheme that allows distributed, concurrent code processing over a Terracotta cluster. It uses a concept of independent nodes (borrowed from Erlang). Each node polls a shared list to find a new expression to evaluate. Once it finds a job to be done, it evaluates the expression using a JScheme evaluator (instance of jscheme.JScheme class) and writes a result back on the list. The client process which initiated the task, reads back the result and returns it.
Since all nodes are independent entities, you can start as many of them as you need and use them concurrently. But in most cases the optimal number of nodes is equivalent to the number of CPUs (or CPU cores) to be used on each machine connected to Terracotta. So if you have 2 computers with a quad core CPU and want to use only half of their power, you can start 2 nodes on each of them. If you want to use them to the full, you should use 8 nodes, 4 per each machine, and so on. You can start the nodes on single machine using a single or multiple JScheme shells, it's up to you. For me, a single Scheme REPL seems to be the most convenient option.
Client jobs are started through tc-map. It's a function that is similar to the standard map, but it takes an additional argument - a list of nodes to use for the job. Unfortunately, the system is not fault tolerant, so if one of the nodes dies during doing the job, the whole processing task hangs up. The only way out then is either to restart the dead node or evaluate the whole tc-map expression again.

OK, enough for the theory, let's do some real work. First you need to download the library. The online folder contains the library itself (jstc.scm), sample Terracotta configuration (tc-config.xml) and some tests. After you get the library and the configuration file, you should start the Terracotta server (I described the whole procedure in detail previously). If you run the Terracotta server on a remote machine, you should also edit the tc-config.xml file and change server host to the IP of the Terracotta host. Now you can start JScheme through the Terracotta bootstrap:
java -Xbootclasspath/p:[terracotta boot jar] -Dtc.config=tc-config.xml -Dtc.install-root=[terracotta install dir] -jar jscheme.jar
You can to find the boot jar in lib/dso-boot folder of your Terracotta installation directory. If it isn't there, you can generate it with Terracotta make-boot-jar.sh script.
Now you can load the library with:
(load "jstc.scm")
and start playing with it. For starters, let's run two nodes:
(start-node "test1")
(start-node "test2")
and define a helper function to generate a list of integers from a specified range:
(define (range min max) (let loop ((x max) (l '())) (if (< x min) l (loop (- x 1) (cons x l)))))
Now we need to "teach" running evaluators the Fibonacci function:
(tc-load (list "test1" "test2") "(define (fib n) (cond ((= n 0) 0) ((= n 1) 1) (else (+ (fib (- n 1)) (fib (- n 2))))))" )
and we are ready to spread a test job across the nodes:
(tc-map (list "test1" "test2") "fib" (range 1 20))
After a few seconds you should receive the following list:
(1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765)
You can use time function to compare computation times with results received by using a single node or a regular, sequential map:
(time (tc-map (list "test1" "test2") "fib" (range 1 20)))
(time (tc-map (list "test1") "fib" (range 1 20)))
(define (fib n) (cond ((= n 0) 0) ((= n 1) 1) (else (+ (fib (- n 1)) (fib (- n 2))))))
(time (map fib (range 1 20)))
Take note that a function passed to map is a regular Scheme expression, while with tc-map it must be a string.

It is quite possible that you get no gain over sequential processing using less than 3 nodes running on 3 cores with this library. The main reason is that Terracotta introduces some overhead itself. The second one is that nodes poll Terracotta for job lists in 20ms intervals. Those intervals are necessary if you don't want to consume the whole CPU power just for loops and leave none for jobs. You can adjust them by changing the value of JSTC_EVAL_DELAY.

I did some tests and I must say that the results surprised me. On my home laptop (Core2 Duo T5450 1.66 Ghz, 2 cores, 2GB RAM) the results looked like this:
(time (map fib (range 1 25))) - 17234 msec
(time (tc-map (list "test1") "fib" (range 1 25))) - 29020 msec
(time (tc-map (list "test1" "test2") "fib" (range 1 25))) - 19620 msec
while on three servers (dual Xeon E5430 2.66 Ghz, 8 cores, 8GB RAM each):
(time (map fib (range 1 25))) - 12687 msec
(time (tc-map (list "test1") "fib" (range 1 25))) - 22502 msec
but:
(time (tc-map (list "test1" "test2") "fib" (range 1 25))) - 25256 msec
(time (tc-map (list "test1" "test2" "test3") "fib" (range 1 20))) - 22355 msec
when I ran the tests on a single machine, and:
(time (tc-map (list "test1" "test2") "fib" (range 1 25))) - 14216 msec
(time (tc-map (list "test1" "test2" "test3") "fib" (range 1 20))) - 11538 msec
when each node was on a different machine.
On my laptop I got almost 150% speedup by using two Terracotta nodes instead of one, but on a server machine two nodes actually slowed the tasks down. I could get faster job processing only by spreading nodes across different machines. Adding new nodes to the machines seemed to have no impact on the results, so I couldn't get past 250% speedup factor.
Weird, isn't it?

2 comments:

Unknown said...

Fascinating. We should get together via Webex and performance analyze this. I think there may be an issue or 2 here. Not sure though.

--Ari

kklis said...

Ari,
This is only a proof of concept, and I don't think you can get any more further using JScheme, without rebuilding its internals. JScheme is very well designed (one of its authors is Peter Norvig, director of research at Google) and elegant, but it's more an academic tool than a production environment. JScheme is rather slow, has no active community, no support, and there are practically no success stories to show off with.
If you are looking for interesting projects to run on Terracotta, I suggest contacting Rich Hickey, author of Clojure, a kick ass modern Lisp that has a lot of outstanding features like Software Transactional Memory, concurrent processing, etc. Clojure has been actively developed and has a very strong community around it.
Last month I suggested Rich to port Clojure to Terracotta (look at this thread). It turns out that Rich has already successfully experimented with Terracotta and some Clojure community members are not far from integrating Clojure with Terracotta. I am sure that Clojure will work much better with Terracotta than JScheme.