Erlang tips and tricks: Mnesia

Thursday, August 7, 2008

Erlang tips and tricks: Mnesia

Mnesia is one of Erlang killer applications. It's a distributed, fault tolerant, scalable, soft real-time database system which makes building data clusters really easy. It is not designed as a fully blown relational database, and it should not be treated as its replacement. Although Mnesia supports transactions (in a much more powerful way than traditional databases, as any Erlang function can be a transaction), it does not support SQL or any of its subsets. Mnesia is best suitable for a distributed, easily scalable data storage, which can be shared by a large cluster of servers. One of the most notably known examples are ejabberd - a famous distributed jabber/xmpp server - and cacherl - a distributed data caching system compatible with memcached interface.

Setting up a cluster of Mnesia nodes is easy. First you need to set up a cluster of Erlang nodes as I described in my post Erlang tips and tricks: nodes. Start Mnesia instances on all of them:

mnesia:start().

Now choose one of the nodes as initial Mnesia server node. It can be any node, as Mnesia works in peer-to-peer model and does not use any central management point. Go to a chosen Erlang shell and move the database schema to disc to make it persistent:

mnesia:change_table_copy_type(schema, node(), disc_copies).

You can now create a new table, for example:

mnesia:create_table(test, [{attributes, [key, value]}, {disc_copies, [node()]}]).

which creates table test with two columns: key and value. Attribute disc_copies means that a table will be held in RAM, but its copy will be also stored on disc. If you don't store any persistent data in your table, you can use ram_copies attribute for better performance. On the other hand, if you want to spare some of your system memory, you can use disc_only_copies attribute, but at the cost of reduced performance.
Now let's add a second node to Mnesia:

mnesia:change_config(extra_db_nodes, [node2@host.domain]).

Of course replace node2@host.domain with the name of a node you want to add (or a list of nodes: [node2@host.domain, node3@host.domain, ...]).
Now switch to Erlang shell at the node you have just added and move its schema to disc, so Mnesia remembers it's a part of the cluster:

mnesia:change_table_copy_type(schema, node(), disc_copies).

You can now create a copy of table test on this node:

mnesia:add_table_copy(test, node(), disc_copies).

Now you can add a third node, a fourth node, etc. in the same way as you added the second one. When you have many nodes in the cluster you can keep tables as disc_copies only on some nodes as backup, and use ram__copies on other nodes to improve their performance. It generally makes sense to keep tables on disc only on one of the nodes per single machine.

To remove node@host.domain from the cluster stop Mnesia on that node:

mnesia:stop().

Now switch to any other node in the cluster and do:

mnesia:del_table_copy(schema, node@host.domain).

Mnesia is strongly fault-tolerant, which means that generally you don't need to worry when one of your nodes crashes. Just restart it and reconnect it to the cluster - Mnesia node will synchronize itself and fix all broken and out-of-date tables. I really like to imagine Mnesia as a database equivalent of T-1000, which even heavily damaged or broken into pieces, every time reassembles itself to its original form.

Unfortunately Mnesia has its limits. A storage limit seems to be the most troublesome of them.

5 comments:

Francesco said...: Hi, I read your article, since I'm starting to work with erlang and mnesia.

Very quickly we have an ejabberd instance running, that filters some messages and saves them on a mnesia table (I'll call this write@localhost).

What we'd like to do is having a separate erlang application, deployed on another machine, read this table and process data (I'll call this read@localhost).

I'm trying to make read@localhost access write@localhost but even if I connect the two nodes seems the database has to be shared across nodes to be accessible from both. Is this correct? Isn't there any other solution to read data from a node without replicating it across nodes?; July 27, 2009 at 5:32 PM
kklis said...: You don't have to replicate tables across nodes to make it accessible. If you only copy database schema from write@localhost to read@localhost and don't create a local copy on read@localhost (skip add_table_copy() step), you'll be getting data with mnesia:read() from table located on write@localhost. This way you can have a cluster of many nodes and only two or three replicas of each table, spread among different nodes for safety.; July 27, 2009 at 10:03 PM
kklis said...: You can be also interested in this tutorial. It covers your problem with more details.; July 27, 2009 at 10:05 PM
Francesco said...: I understand, I'll try it this way. Thanks a lot for your help! :); July 28, 2009 at 10:49 AM
Anonymous said...: Really good - concise tutorial - I'd struggled to find out why I was getting bad_arg errors when creating disc_copies tables - not moving the schema to disc first. Thanks!; October 17, 2009 at 7:27 PM