Always got a dead data node cause by connect failure in cluster. (no replies)

I am running a MySQL cluster. The details for version and node number is list below:
======================================================================
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=10 @172.16.7.52 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 0, Master)
id=11 @172.16.7.53 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 0)
id=12 @61.56.8.154 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 1)
id=13 @61.56.8.154 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @172.17.7.51 (mysql-5.1.51 ndb-7.2.0)

[mysqld(API)] 4 node(s)
id=20 @172.16.7.52 (mysql-5.1.51 ndb-7.2.0)
id=21 @172.16.7.53 (mysql-5.1.51 ndb-7.2.0)
id=22 @61.56.8.154 (mysql-5.1.51 ndb-7.2.0)
id=23 @61.56.8.155 (mysql-5.1.51 ndb-7.2.0)
=======================================================================

My problem is that in all four data nodes, there is always one data node will be shutdown beacuse some kind of connection failure. After that, the cluster works stable with three data nodes remain. It looks like:
========================================================================
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=10 @172.16.7.52 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 0, Master)
id=11 @172.16.7.53 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 0)
id=12 @61.56.8.154 (mysql-5.1.51 ndb-7.2.0, Nodegroup: 1)
id=13 (not connected, accepting connect from ndb-node4)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @172.17.7.51 (mysql-5.1.51 ndb-7.2.0)

[mysqld(API)] 4 node(s)
id=20 @172.16.7.52 (mysql-5.1.51 ndb-7.2.0)
id=21 @172.16.7.53 (mysql-5.1.51 ndb-7.2.0)
id=22 @61.56.8.154 (mysql-5.1.51 ndb-7.2.0)
id=23 @61.56.8.155 (mysql-5.1.51 ndb-7.2.0)
=========================================================================

After I restart the dead data node, cluster could run normally with all four data nodes for 2~3 hours. And then, it happens again. By the way, the dead data node is not always the same one.

I run all nodes in cluster on VM, OS is RHEL 5.5. Each data node is assigned 8GB memory and 4GB swap. I also checked for the memeoy usage on data node when ndbd is running. The average memory usage is about 2.4GB.

The log on management node looks like:
=========================================================================
2012-03-06 17:12:59 [MgmtSrvr] ALERT -- Node 11: Node 13 Disconnected
2012-03-06 17:12:59 [MgmtSrvr] INFO -- Node 10: Communication to Node 13 closed
2012-03-06 17:12:59 [MgmtSrvr] INFO -- Node 11: Communication to Node 13 closed
2012-03-06 17:12:59 [MgmtSrvr] INFO -- Node 12: Communication to Node 13 closed
2012-03-06 17:12:59 [MgmtSrvr] ALERT -- Node 1: Node 13 Disconnected
2012-03-06 17:12:59 [MgmtSrvr] ALERT -- Node 10: Arbitration check won - node group majority
2012-03-06 17:12:59 [MgmtSrvr] INFO -- Node 10: President restarts arbitration thread [state=6]
2012-03-06 17:12:59 [MgmtSrvr] ALERT -- Node 10: Node 13 Disconnected
2012-03-06 17:12:59 [MgmtSrvr] ALERT -- Node 12: Node 13 Disconnected
2012-03-06 17:13:00 [MgmtSrvr] INFO -- Node 10: Local checkpoint 125 completed
2012-03-06 17:13:00 [MgmtSrvr] ALERT -- Node 13: Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'.
2012-03-06 17:13:20 [MgmtSrvr] INFO -- Node 12: Communication to Node 13 opened
2012-03-06 17:13:20 [MgmtSrvr] INFO -- Node 10: Communication to Node 13 opened
2012-03-06 17:13:20 [MgmtSrvr] INFO -- Node 11: Communication to Node 13 opened
2012-03-06 18:11:12 [MgmtSrvr] INFO -- Node 10: Local checkpoint 126 started. Keep GCI = 191298 oldest restorable GCI = 190007
2012-03-06 18:11:59 [MgmtSrvr] INFO -- Node 10: Local checkpoint 126 completed
=============================================================================

And the error log on node13 looks like:
=============================================================================
Time: Tuesday 6 March 2012 - 17:12:59
Status: Temporary error, restart node
Message: Node declared dead. See error log for details (Arbitration error)
Error: 2315
Error data: We(13) have been declared dead by 11 (via 12) reason: Connection failure(5)
Error object: QMGR (Line: 3657) 0x00000002
Program: ndbd
Pid: 4274
Version: mysql-5.1.51 ndb-7.2.0-beta
Trace: /users1/mysql-cluster/ndb_13_trace.log.9
***EOM***
=============================================================================

And here is the config.ini
=============================================================================
[NDBD DEFAULT]
NoOfReplicas= 2
ServerPort= 2202

# Data Memory, Index Memory, and String Memory #
DataMemory= 1024M
IndexMemory= 256M
StringMemory= 5

MaxNoOfOrderedIndexes= 1024
MaxNoOfAttributes= 10000
MaxNoOfTables= 2500
MaxNoOfConcurrentOperations= 250000
MaxNoOfConcurrentIndexOperations= 250000
MaxNoOfFiredTriggers= 4000
TransactionBufferMemory= 1M

# Scans and buffering #
MaxNoOfConcurrentScans= 300
MaxNoOfLocalScans= 32
BatchSizePerLocalScan= 64
LongMessageBuffer= 1M

# Controlling Timeouts, Intervals, and Disk Paging #
TimeBetweenWatchDogCheck= 6000
TimeBetweenWatchDogCheckInitial= 6000
StartPartialTimeout= 30000
StartPartitionedTimeout= 60000
StartFailureTimeout= 1000000
HeartbeatIntervalDbDb= 5000
HeartbeatIntervalDbApi= 5000
TimeBetweenLocalCheckpoints= 20
TimeBetweenGlobalCheckpoints= 2000
TransactionInactiveTimeout= 0
TransactionDeadlockDetectionTimeout= 1200
ArbitrationTimeout= 3000

# Buffering and Logging #
UndoIndexBuffer= 2M
UndoDataBuffer= 1M
RedoBuffer= 32M

# Backup Parameters #
BackupDataBufferSize= 2M
BackupLogBufferSize= 2M
BackupMemory= 64M
BackupWriteSize= 32K
BackupMaxWriteSize= 256K

[MGM DEFAULT]
PortNumber= 1186
DataDir= /var/lib/mysql-cluster # Directory for this management node's pidfiles

[NDB_MGMD]
NodeId= 1
ArbitrationRank= 1
HostName= ndb-manager # Hostname or IP address of management node
DataDir= /var/lib/mysql-cluster # Directory for this management node's pidfiles
LogDestination= FILE:filename=/var/log/mysql-cluster/ndb_manager.log, maxsize=500000, maxfiles=4

[NDBD]
NodeId= 10
HostName= ndb-node1 # Hostname or IP address
DataDir= /users1/mysql-cluster # Directory for this data node's datafiles

[NDBD]
NodeId= 11
HostName= ndb-node2 # Hostname or IP address
DataDir= /users1/mysql-cluster # Directory for this data node's datafiles

[NDBD]
NodeId= 12
HostName= ndb-node3 # Hostname or IP address
DataDir= /users1/mysql-cluster # Directory for this data node's datafiles

[NDBD]
NodeId= 13
HostName= ndb-node4 # Hostname or IP address
DataDir= /users1/mysql-cluster # Directory for this data node's datafiles

#
# Note: The following can be MySQLD connections or
# NDB API application connecting to the cluster
#

[API]
NodeId= 20
HostName= ndb-node1

[API]
NodeId= 21
HostName= ndb-node2

[API]
NodeId= 22
HostName= ndb-node3

[API]
NodeId= 23
HostName= ndb-node4
============================================================================

Any ideas to fix it?

Thanks.

Always got a dead data node cause by connect failure in cluster. (no replies)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112