Monday, August 4, 2008

Erlang tips and tricks: nodes

The title might be a bit exaggerated, but if you have just started your adventure with Erlang I would like to provide you with a couple of hints that can spare you a serious headache.

The basic tool to work with Erlang is its REPL shell, started in terminal mode with erl command. The name REPL comes from read, eval, print, loop cycle and among functional languages is a commonly used term for interactive shell. Since Erlang has been developed with distributed programming in mind, you can start as many shells as you like and make them communicate with each other. The basic rule you have to remember about is that you can interface only shells that share the same cookie. A cookie is simply a string that acts as a shared password to all nodes in a cluster, whether they run on the same computer or different machines across the network. You can set it either from the command line when starting REPL using -setcookie parameter or from the Erlang shell itself:
You can also edit .erlang.cookie file in your home directory so you don't have to set it up every time you start the shell. However, this method has some nasty side effects and that's why it is not recommended. The first one is that all Erlang applications you start from your user account, including all REPLs, will share the same cookie, which not always is a desired behaviour. Secondly, it is very likely that you forget to edit this file on another machine where you move your Erlang application to (or will not have enough permissions to do it), and your application will not work in environment other than yours.

So now when you have started your shells and set up a shared cookie you may want to check the connectivity between them. But first you need to know how to call them - now here's where the real fun begins. Erlang allows you to call a node (shell instance) with either a short or a long name ("-sname" and "-name" command line parameters respectively). A short name is a common name like "test" while a long name is a fully qualified domain name like "test@host.domain". Short names are shared across machines in the same domain, while FQDN names can be used (theoretically) across the whole Internet. So you start a short name REPL with:
/usr/bin/erl -sname test1 -setcookie mysecretcookie
So far so good. Now try another one within the same domain:
/usr/bin/erl -sname test2 -setcookie mysecretcookie
And now you want to ping test2 from test1 to check if everything is OK. So you input the following command in test1 REPL:
And you see "pang", which means that something went wrong. So you start to tear your hair out until you realize that test1 is not a node name in Erlang! Now go back to your REPL again and look carefully at the prompt. You will probably see something like:
Now try to ping test2 again, but this time use a full name as displayed in REPL prompt:
And what you should now see is "pong", which means that the nodes can now see each other. Note that test2@localhost, although doesn't seem so is still a short name, not a fully qualified domain name (it lacks a domain part).

You can always see a list of all other hosts in the cluster after issuing command:
in REPL. But remember about one important thing: new hosts are not seen by Erlang automatically. If you start a new Erlang instance and want it to show on the nodes() list, you have to ping one of the hosts already existing in the cluster. Information about a new node will be automatically propagated among all other nodes. To avoid it, you can use .hosts.erlang file in your home directory, which role is similar to the role of .erlang.cookie (including side effects) - it holds the list of all nodes which will be automatically informed about every new Erlang instance started on your user account, for example:
You need to have one empty line at the file end (look here for more information about the syntax).

So here are the basic things you should remember about when building an Erlang cluster:
1) Choose carefully between short and fully qualified domain names, since the first cannot communicate with the latter ones (to make it simple: you cannot mix short and long named nodes across one Erlang cluster).
2) When using more than one machine in your cluster, make sure all DNS records on all machines are set up properly to avoid communication problems.
3) Use the same cookie for all Erlang instances you want to put in a cluster.
4) When you start a new node always inform the cluster about it by pinging one of the running nodes.
5) Open port 4369 TCP/UDP on your firewall for all clustered machines - it is used by epmd daemon, which handles inter-nodes Erlang communication.


grantmichaels said...

I just happened upon your blog and it's not only just right, but right on time (for my comprehension level re: erlang) ...

Thanks for all of the great, practical insight.


kklis said...

Thank you for a positive feedback! It's nice to hear that there are people who find this blog useful :-)

Tyler Gillies said...

Awesome blog. I *love* the title