Thursday, August 7, 2008

Web services need balance

So we already have a nice and fast web service based on PHP and Yaws or not so fast, but still nice, service powered by Java and Tomcat. We also have a powerful Mnesia storage back-end we can communicate with through a dedicated api. Now we are ready to scale.

There's a lot of ways for balancing a web cluster. You can use one of many software load balancers, configure a front-end web server as a reverse proxy or even use a dedicated hardware load balancer. But hey, we have Erlang, so why bother with complicated software or spend money for expensive hardware? Writing a simple load balancer in Erlang took me about an hour, including testing, and I am not a very experienced Erlang programmer. This can give you some picture of how functional programming can improve your performance as a developer.

What the balancer actually does is checking the system load on all nodes in a cluster and returning a name of the least loaded one. The software is GPL licensed and can be downloaded from here. You should compile it with:
erlc balancer.erl
and deploy it to all machines you want to monitor. Now you can check the system load of the current node:
balancer:load().
all of the nodes you are connected to:
balancer:show(nodes()).
all of the nodes including current node:
balancer:show([node()|nodes()]).
pick the least loaded one with:
balancer:pick(nodes()).
or with:
balancer:pick([node()|nodes()]).
Due to Erlang nature, dead nodes are instantly removed from the nodes() list, so they are not queried. Additionally, the balancer filters out all nodes that returned an error, so you always get valid results, with no timeouts, deadlocks, etc. However, the result only says that a machine is up, so if a web server dies for some reason, and the machine itself did not crash, the node will still appear on the list (which is quite obvious).

Since the balancer is written in Erlang, it seems natural to deploy Yaws as a front-end web server and use it to redirect users to the servers that are least loaded at the moment. Suppose we have each web server running on different sub-domain (s1.host.domain, s2.host.domain, etc.), and each of them runs a single Erlang node. In this case our index.yaws on http://host.domain/ can look like this:
<html>
<erl>
out(Arg) ->
    {Node, _} = balancer:pick(nodes()),
    [_|[Host|_]] = string:tokens(atom_to_list(Node), "@"),
    Url = string:concat("http://", Host),
    {redirect, Url}.
</erl>
</html>
Connecting to http://host.domain/ will now automatically redirect you to the fastest server. It is your choice if you want to keep users bound to the back-end server or refactor links on your web site to make them go through the redirector each time they click on a link. In the latter case, if you provide dynamic content, you can use Mnesia to store user sessions so they would not get logged out if they suddenly switch from one back-end server to another.

No comments: