Mnesia can't connect to another node

I jumped into the #rabbitmq channel on freenode. Here's the discussion that followed:

14:29 shakakai: hey all, i'm having a little issue with clustering rabbitmq http://stackoverflow.com/questions/6948624/mnesia-cant-connect-to-another-node
14:30 shakakai: has anyone run into that problem before?
14:30 daysmen has left IRC (Read error: Connection reset by peer)
14:30 antares_: shakakai: make sure that epmd is running on every node
14:30 antares_: shakakai: and that port it uses (4369) is open in your firewall
14:31 |Blaze|: shakakai: is your dns correct?  Can you ping worker1 from celery and celery from worker1
14:31 shakakai: |Blaze|: hmm...i'll check
14:31 daysmen has joined ([email protected])
14:32 shakakai: |Blaze|: this is where I'm a little confused, the rabbitmq nodename is worker1@worker1 but the fqdn to ping the box is "ping worker1.mydomain.com"
14:33 |Blaze|: can you "ping worker1"
14:34 shakakai: |Blaze|: no
14:34 |Blaze|: k, you'll need to fix that
14:34 hyperboreean has left IRC (Ping timeout: 250 seconds)
14:37 shakakai: |Blaze|: gotcha, so I setup a hosts file and i should be good
14:37 |Blaze|: yup
14:37 |Blaze|: in both directions

TL;DR

Make sure you can ping the rabbit nodename from each of the boxes you are clustering. If you can't, setup a hosts file for each rabbit nodename.


I installed the Docker RabbitMQ also encountered similar problems in the process.

The main reason is /var/lib/RabbitMQ/mnesia/rabbit/cluster_nodes.config configuration file on errors cannot be connected to.

Mnesia is a distributed, soft real-time database management system written in the Erlang programming language

There are several ways to repair this problem:

  1. Fix the configure file,using the correct cluster node name, from the log we see that our Node name is rabbit@cb43449d5d72
// log info 
...
rabbitmq    |   Starting broker...2019-11-27 16:18:22.621 [info] <0.304.0>
rabbitmq    |  node           : rabbit@cb43449d5d72
...

// This is the wrong configuration file:
$ cat ./mnesia/rabbit/cluster_nodes.config
{[rabbit@cb43449d5d72,rabbit@dc3288264c34],[rabbit@dc3288264c34]}.

// Update it with correctly config node name, and restart RabbitMQ server:
$ cat ./mnesia/rabbit/cluster_nodes.config
{[rabbit@cb43449d5d72],[rabbit@cb43449d5d72]}.
  1. The simplest way is to remove the mnesia directory and configure the correct node name, which like rabbit@my-rabbit, in /etc/hosts is 127.0.0.1 my-rabbit, after the operation, you should see the following configuration details
$ find . -name cluster_nodes.config
./mnesia/rabbit/cluster_nodes.config
./mnesia/rabbit@my-rabbit/cluster_nodes.config

$ cat ./mnesia/rabbit@my-rabbit/cluster_nodes.config
{['rabbit@my-rabbit'],['rabbit@my-rabbit']}.

There are several things to check before you can get the cluster to work well: 0) Ensure you are running the exact same rabbitmq version on each node 1) set up network until you are able to ping each server from each other 2) cookies - You have to get the exact same erlang cookie in the .erlang.cookie file on each server One trick is useful is to try this command from one node to see if you can reach another one from rabbitmq rabbitmqctl eval 'net_adm:ping(rabbit@othernode).'

this should say Pang if it's nok or pong if it's ok be careful to not forget the dot close to the end of the eval expression.

I got it working fine after several hours of unsuccessful trials.

3) Bear in mind that there may be an issue when restarting a node of a cluster if this node was not the last that was stop - it wont start before the last that stop was restarted. When all the above (0 to 2) are correct, 3 may well be the root cause of your problem...

Hope this help, cheers, jb