I'm running a 7.1.8 cluster (2 SQL/MGMT nodes, 2 NDB nodes) on Ubuntu 10.04.1 (64-bit).
I've found that the ndbmtd on each NDB node stops logging after some time. [I'm referring to the "ndb_x_out.log" file, not a REDO log.]
I'll stop seeing new lines being appended to the "ndb_x_out.log" file, sometimes mid-line. (I've checked, both ndbmtd (daemon & angel) do in fact still have it open. Usually it's something like this:
send lock node 19 waiting for lock, contentions: 9 spins: 25922
jbalock thr: 1 waiting for lock, contentions: 200 spins: 29922
send lock node 19 waiting for lock, contentions: 10 spins: 25923
send lock node 19 waiting for lock, contentions: 11 spins: 29330
send lock node 19 waiting for lock, contentions: 12 spins: 32464
send lock node 19 waiting for lock, contentions: 13 spins: 35449
send lock node 19 waiting for lock, contentions: 14 spins: 38
(Notice that the last line should have included three more digits on the number of spins.)
There is no indication of any problem, anywhere. All queries work, ndb_mgmd is ok and reports everything up. No indication of errors in the SQL error.log or anywhere else.
It's as though ndbmtd's file write buffer has gone to never-never land :-(
This does not occur on each NDB node at the same time; sometimes one of them stops logging quite a while before the other one.
If I perform a rolling restart, what appear to be the most recently buffered log lines are written (again, more "send lock node..." stuff) but there has clearly been a gap, followed by the shutdown/start messages that I'd expect from the rolling restart. Then the logging works again for some time, and eventually I get the same problem.
The cluster configuration is nothing unusual, and is largely based on the severalnines.com/config tool.
Any ideas???? Thanks...
I've found that the ndbmtd on each NDB node stops logging after some time. [I'm referring to the "ndb_x_out.log" file, not a REDO log.]
I'll stop seeing new lines being appended to the "ndb_x_out.log" file, sometimes mid-line. (I've checked, both ndbmtd (daemon & angel) do in fact still have it open. Usually it's something like this:
send lock node 19 waiting for lock, contentions: 9 spins: 25922
jbalock thr: 1 waiting for lock, contentions: 200 spins: 29922
send lock node 19 waiting for lock, contentions: 10 spins: 25923
send lock node 19 waiting for lock, contentions: 11 spins: 29330
send lock node 19 waiting for lock, contentions: 12 spins: 32464
send lock node 19 waiting for lock, contentions: 13 spins: 35449
send lock node 19 waiting for lock, contentions: 14 spins: 38
(Notice that the last line should have included three more digits on the number of spins.)
There is no indication of any problem, anywhere. All queries work, ndb_mgmd is ok and reports everything up. No indication of errors in the SQL error.log or anywhere else.
It's as though ndbmtd's file write buffer has gone to never-never land :-(
This does not occur on each NDB node at the same time; sometimes one of them stops logging quite a while before the other one.
If I perform a rolling restart, what appear to be the most recently buffered log lines are written (again, more "send lock node..." stuff) but there has clearly been a gap, followed by the shutdown/start messages that I'd expect from the rolling restart. Then the logging works again for some time, and eventually I get the same problem.
The cluster configuration is nothing unusual, and is largely based on the severalnines.com/config tool.
Any ideas???? Thanks...