Wednesday, 2 January 2013

Setting a riak cluster in Amazon EC2

Just some words related to a riak 1.2.1 cluster installation inside the Amazon AWS network. The official information is very poor documented, some tips were discovered by trial and error, so maybe this post will help you if you are in the same position to build a riak cluster and fighting with poor documentation.

I used Ubuntu Server 12.04 LT micro instances, 64 bits, from cost reasons.

I use the following terminology in order to refer the nodes inside the cluster: Master node and a Slave nodes. The slave nodes will join the Master as they are needed.

The first step is to install riak on every node as it is described here.

First of all, do not use public IPs in configuring the individual nodes. Use the private IPs obtained inside the Amazon AWS network.

Even I followed exactly the steps, when tried to start the riak service, I've fallen in the following error:

Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.

After examining the riak console, by issuing the

riak console

command, the following information is useful :

** Found 1 name clashes in code paths
08:06:51.374 [info] Application lager started on node 'riak@XX.XXX.XX.XXX'
08:06:51.418 [warning] No ring file available.
08:06:51.457 [error] CRASH REPORT Process <0 data-blogger-escaped-.165.0=".165.0"> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320

This means that another node is running in the memory and there is a clash in what concerns the port number.

The solution is to issue a

ps aux | grep riak

command and kill all the running riak processes, then restart the service.

If the virtual machine hosting the riak service was stopped for various reasons, restarting it will change its IP so the configuration steps must be reiterated. However the ring information is no more valid anymore. In my situation, being a test installation, I didn't need the stored data so I have deleted the ring info by issuing the follow commands:

cd /var/lib/riak/ring
sudo rm -rf *.*

In case of valuable data, the ring must be redone by performing:

riak-admin reip <old_nodename> <new_nodename>

where  <old_nodename> and <new_nodename> are the names given in /etc/riak/vm.args

Restart then the riak service. It should work now.

If the settings were done on the Master slave, leave it as it is and go to the Slave node(s). For a Slave node, the next step is to join the current node to the Master node. This is done by issuing the command:

riak-admin cluster join <master_nodename>

This is a staged command, it is only planned but not committed since a specific commit command isn’t issued:

riak-admin cluster commit

Perform  these steps for every Slave node in the cluster.  

Hope this post will help you to set a riak cluster in Amazon much faster and smoothly than me.

1 comment:

  1. Thanks for this. I was completely stuck until I read what you said about using the private IPs instead of the public ones.