Thursday, August 7, 2008

Erlang tips and tricks: Mnesia

Mnesia is one of Erlang killer applications. It's a distributed, fault tolerant, scalable, soft real-time database system which makes building data clusters really easy. It is not designed as a fully blown relational database, and it should not be treated as its replacement. Although Mnesia supports transactions (in a much more powerful way than traditional databases, as any Erlang function can be a transaction), it does not support SQL or any of its subsets. Mnesia is best suitable for a distributed, easily scalable data storage, which can be shared by a large cluster of servers. One of the most notably known examples are ejabberd - a famous distributed jabber/xmpp server - and cacherl - a distributed data caching system compatible with memcached interface.

Setting up a cluster of Mnesia nodes is easy. First you need to set up a cluster of Erlang nodes as I described in my post Erlang tips and tricks: nodes. Start Mnesia instances on all of them:
mnesia:start().
Now choose one of the nodes as initial Mnesia server node. It can be any node, as Mnesia works in peer-to-peer model and does not use any central management point. Go to a chosen Erlang shell and move the database schema to disc to make it persistent:
mnesia:change_table_copy_type(schema, node(), disc_copies).
You can now create a new table, for example:
mnesia:create_table(test, [{attributes, [key, value]}, {disc_copies, [node()]}]).
which creates table test with two columns: key and value. Attribute disc_copies means that a table will be held in RAM, but its copy will be also stored on disc. If you don't store any persistent data in your table, you can use ram_copies attribute for better performance. On the other hand, if you want to spare some of your system memory, you can use disc_only_copies attribute, but at the cost of reduced performance.
Now let's add a second node to Mnesia:
mnesia:change_config(extra_db_nodes, [node2@host.domain]).
Of course replace node2@host.domain with the name of a node you want to add (or a list of nodes: [node2@host.domain, node3@host.domain, ...]).
Now switch to Erlang shell at the node you have just added and move its schema to disc, so Mnesia remembers it's a part of the cluster:
mnesia:change_table_copy_type(schema, node(), disc_copies).
You can now create a copy of table test on this node:
mnesia:add_table_copy(test, node(), disc_copies).
Now you can add a third node, a fourth node, etc. in the same way as you added the second one. When you have many nodes in the cluster you can keep tables as disc_copies only on some nodes as backup, and use ram__copies on other nodes to improve their performance. It generally makes sense to keep tables on disc only on one of the nodes per single machine.

To remove node@host.domain from the cluster stop Mnesia on that node:
mnesia:stop().
Now switch to any other node in the cluster and do:
mnesia:del_table_copy(schema, node@host.domain).
Mnesia is strongly fault-tolerant, which means that generally you don't need to worry when one of your nodes crashes. Just restart it and reconnect it to the cluster - Mnesia node will synchronize itself and fix all broken and out-of-date tables. I really like to imagine Mnesia as a database equivalent of T-1000, which even heavily damaged or broken into pieces, every time reassembles itself to its original form.

Unfortunately Mnesia has its limits. A storage limit seems to be the most troublesome of them.

5 comments:

Francesco said...

Hi, I read your article, since I'm starting to work with erlang and mnesia.

Very quickly we have an ejabberd instance running, that filters some messages and saves them on a mnesia table (I'll call this write@localhost).

What we'd like to do is having a separate erlang application, deployed on another machine, read this table and process data (I'll call this read@localhost).

I'm trying to make read@localhost access write@localhost but even if I connect the two nodes seems the database has to be shared across nodes to be accessible from both. Is this correct? Isn't there any other solution to read data from a node without replicating it across nodes?

kklis said...

You don't have to replicate tables across nodes to make it accessible. If you only copy database schema from write@localhost to read@localhost and don't create a local copy on read@localhost (skip add_table_copy() step), you'll be getting data with mnesia:read() from table located on write@localhost. This way you can have a cluster of many nodes and only two or three replicas of each table, spread among different nodes for safety.

kklis said...

You can be also interested in this tutorial. It covers your problem with more details.

Francesco said...

I understand, I'll try it this way. Thanks a lot for your help! :)

Anonymous said...

Really good - concise tutorial - I'd struggled to find out why I was getting bad_arg errors when creating disc_copies tables - not moving the schema to disc first. Thanks!