Sunday, September 6, 2009

Top gear(man)

Gearman is an open source project providing a flexible and universal framework for writing distributed applications. It differs from similar projects in easiness of use and the number of bindings for programming languages it provides: C, C++, Java, Perl, PHP and Python. In fact, Gearman has a simple command line client, that allows you to start jobs using any language you want - all you need to do is to provide the client with input data and then fetch the client's output. Gearman API is very simple, consistent, and makes writing distributed applications really easy, quick and fun.

Gearman architecture is equally simple: it consists of job servers, that accept task requests from clients and forward them to workers, and send results back to clients. Each worker can be connected to many job servers, and a client can choose which job server to use - this way there is no single point of failure that could break down the whole cluster. Job servers have their own queues and in case of worker failure they can reassign tasks to other workers. According to High Scalability Gearman has been successfully used by LiveJournal, Yahoo!, and Digg (which claims to run 300000 jobs a day through Gearman without any issues).

I decided to try out Gearman at home, and I must say that it was a really pleasant experience. I wrote a simple C++ worker and even simpler Python client. The worker recursively finds Fibonacci number for given n:
#include <cstring>
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <libgearman/gearman.h>
#include <libgearman/worker.h>

using namespace std;

void *fib_worker(gearman_job_st *job, void *cb_arg, size_t *result_size, gearman_return_t *ret_ptr);
long fib(long n);
static void usage(char *name);

int main(int argc, char *argv[])
{
  int c;
  char *host = "127.0.0.1";
  in_port_t port = 0;
  gearman_worker_st worker;

  while ((c = getopt(argc, argv, "h:p:")) != -1) {
    switch(c) {
      case 'h':
        host = optarg;
        break;
      case 'p':
        port = (in_port_t) atoi(optarg);
        break;
      default:
        usage(argv[0]);
        exit(1);
    }
  }

  if (argc != optind) {
    usage(argv[0]);
    exit(1);
  }

  gearman_worker_create(&worker);
  gearman_worker_add_server(&worker, host, port);
  gearman_worker_add_function(&worker, "fib", 10, fib_worker, NULL);
  while (1) {
    gearman_worker_work(&worker);
  }

  return 0;
}

void *fib_worker(gearman_job_st *job, void *cb_arg, size_t *result_size, gearman_return_t *ret_ptr) {
  char ch[256];
  ostringstream os;
  int size = gearman_job_workload_size(job);
  strncpy(ch, (char *) gearman_job_workload(job), size);
  ch[size] = 0;
  long n = atol(ch);
  os << fib(n);
  string s = os.str();
  *result_size = s.size();
  *ret_ptr = GEARMAN_SUCCESS;
  return strdup(s.c_str());
}

long fib(long n) {
  if (n < 2) {
    return 1;
  } else {
    return fib(n - 2) + fib(n - 1);
  }
}

static void usage(char *name) {
  cout << "Usage: " << name << " [-h <host>] [-p <port>] <string>" << endl;
  cout << "\t-h <host> - job server host" << endl;
  cout << "\t-p <port> - job server port" << endl;
}
Python client prepares a set of jobs for a sequence of n numbers, runs them simultaneously through a job server and sums up the result:
import optparse
from gearman import *

parser = optparse.OptionParser()
parser.add_option('--host', help = "Specifies gearman job server")
parser.add_option('-n', '--num', help = "Amount of Fibonacci numbers to compute")
(opts, args) = parser.parse_args()

client = GearmanClient([opts.host])

ts = Taskset()
for i in range(1, int(opts.num)):
  t = Task(func = "fib", arg = i)
  ts.add(t)

client.do_taskset(ts)

sum = 0
for task in ts.values():
  sum += int(task.result)

print sum
You can download the source code of both worker and client here. After you compile and install Gearman with traditional:
./configure
make
sudo make install
sudo ldconfig
Install Python extension with:
easy_install gearman
And compile the C++ example with:
make
Then you can run a job server as a daemon:
gearmand -d -L 127.0.0.1
or in debug mode:
gearmand -vv -L 127.0.0.1
Next, run a couple of Gearman workers:
./GearmanWorker -h 127.0.0.1
And the Python client:
python GearmanClient.py --host 127.0.0.1 -n 45
For a single machine, it makes sense to run at most as many workers as there are CPUs (or CPU cores) available. For a network cluster, you can run more job servers and workers (and clients) respectively.

I've made some tests with the client and worker above. using my home laptop and an Intel Atom based net-top running together in a local network. For only one laptop worker, computing the sum of 45 Fibonacci numbers took 66.955 seconds, for two laptop workers it took 35.702 seconds, and adding a remote worker reduced the total time to 25.593 seconds. Adding more workers didn't reduce computation time, it even slightly slowed the cluster down - which is quite understandable, as the number of workers exceeded the number of free CPUs (Intel Atom in fact has only one physical core, although applications see it as dual-core CPU).