I don't mean this post in any way to be rude or impolite, but goodbye MySQL Cluster!
We have been running a cluster with the following setup for a bit over a year:
2 dedicated management nodes
2 dedicated query nodes
2 dedicated ndb storage nodes
We always keep all nodes fully up2date with the latest stable version of ndb by using the RHEL5 RPM's provided by MySQL.
About 3 releases ago we began to experience NDB nodes shutting down with the mysterious error:
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DblqhMain.cpp
Error object: DBLQH (Line: 9785) 0x00000006
Program: ndbmtd
Pid: 1068 thr: 2
Version: mysql-5.5.22 ndb-7.2.6
Trace: /data/mysqlcluster//ndb_3_trace.log.13 [t1..t4]
This is a bug that has been published several times and merged now since the beginning of April 2012 however still nobody has even claimed ownership of it.
It appears to be that this happens when NDB is running with very moderate loads only. Not low loads, not high loads, then it's fine, it's when it's being used at exactly what it should be used at that it happens.
The lovely behavior that occurs as a result is that either 1 node crashes or about every 1/10 times both nodes simultaneously decide to shut down.
This now happens to our ndb cluster about once per day.
How this is not flagged as a super-high critical bug is beyond my understanding, the very idea of ndb and mysql cluster is that there is no single point of failure. Honestly if the entirety of the bug was that every week one of the ndb nodes would crash and we'd have to restart it, well it's kind of a pain but we can wait for a bug fix. But seriously, a bug that causes EVERY NDB NODE TO SHUT DOWN SIMULTANEOUSLY makes the ndb and cluster product itself less reliable than if we were just running a single machine with nightly backups to power everything.
Just really disappointed as we invested a lot of time in building a pretty awesome cluster only to find out that the underlying software is faulty :(
So with a tear I say "Goodbye NDB Cluster, hello Percona".
We have been running a cluster with the following setup for a bit over a year:
2 dedicated management nodes
2 dedicated query nodes
2 dedicated ndb storage nodes
We always keep all nodes fully up2date with the latest stable version of ndb by using the RHEL5 RPM's provided by MySQL.
About 3 releases ago we began to experience NDB nodes shutting down with the mysterious error:
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DblqhMain.cpp
Error object: DBLQH (Line: 9785) 0x00000006
Program: ndbmtd
Pid: 1068 thr: 2
Version: mysql-5.5.22 ndb-7.2.6
Trace: /data/mysqlcluster//ndb_3_trace.log.13 [t1..t4]
This is a bug that has been published several times and merged now since the beginning of April 2012 however still nobody has even claimed ownership of it.
It appears to be that this happens when NDB is running with very moderate loads only. Not low loads, not high loads, then it's fine, it's when it's being used at exactly what it should be used at that it happens.
The lovely behavior that occurs as a result is that either 1 node crashes or about every 1/10 times both nodes simultaneously decide to shut down.
This now happens to our ndb cluster about once per day.
How this is not flagged as a super-high critical bug is beyond my understanding, the very idea of ndb and mysql cluster is that there is no single point of failure. Honestly if the entirety of the bug was that every week one of the ndb nodes would crash and we'd have to restart it, well it's kind of a pain but we can wait for a bug fix. But seriously, a bug that causes EVERY NDB NODE TO SHUT DOWN SIMULTANEOUSLY makes the ndb and cluster product itself less reliable than if we were just running a single machine with nightly backups to power everything.
Just really disappointed as we invested a lot of time in building a pretty awesome cluster only to find out that the underlying software is faulty :(
So with a tear I say "Goodbye NDB Cluster, hello Percona".