Quantcast
Channel: MySQL Forums - NDB clusters
Viewing all articles
Browse latest Browse all 1562

MySQL Arbitration question (no replies)

$
0
0
Hi to all.
Our company was very interested by MySQL cluster system and we started testing environment to check how HA solution will work.

So, we built next system (all hosts are XEN-based HVM hosts with CentOS 7 and MySQL Cluster 7.4.7 built from sources).

Node1 (datanode with nodeid 10) IP 192.168.4.61
Node2 (datanode with nodeid 11) IP 192.168.4.62
Node3 (management with nodeid 1) IP 192.168.4.63

All nodes started well and we started to test failures (kill ndbmtd on datanodes, kill ndb_mgmd on management, down interfaces, etc).

We found strange situation:

Starting state of MySQL cluster:

[ndbd(NDB)] 2 node(s)
id=10 @192.168.4.61 (mysql-5.6.25 ndb-7.4.7, Nodegroup: 0, *)
id=11 @192.168.4.62 (mysql-5.6.25 ndb-7.4.7, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.4.63 (mysql-5.6.25 ndb-7.4.7)

After that we close by firewall all incoming traffic on node11
iptables -A INPUT -s 192.168.4.0/24 -j DROP

And right after that we got shutdown of cluster

Error log from node11:

Time: Tuesday 10 November 2015 - 18:30:50
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 6235) 0x00000002
Program: ndbmtd
Pid: 3196 thr: 0
Version: mysql-5.6.25 ndb-7.4.7
Trace: /var/lib/mysql-cluster/ndb_11_trace.log.5 [t1..t4]



Error log from node10:

Time: Tuesday 10 November 2015 - 18:30:50
Status: Temporary error, restart node
Message: Node declared dead. See error log for details (Arbitration error)
Error: 2315
Error data: We(10) have been declared dead by 11 (via 11) reason: Heartbeat failure(4)
Error object: QMGR (Line: 4210) 0x00000002
Program: ndbmtd
Pid: 3479 thr: 0
Version: mysql-5.6.25 ndb-7.4.7
Trace: /var/lib/mysql-cluster/ndb_10_trace.log.8 [t1..t4]

We guess that it may be bug in cluster logic as we got full cluster failure.
Can anybody comment this situation and maybe suggest how to improve high availability?
With best regards

Viewing all articles
Browse latest Browse all 1562

Trending Articles