Dear,
I was doing a deletion of about 180 thousand records when got the cluster failure,
Is it because of the lack of hardware resource or I can do some configuration to prevent it from happening again?
Please help, thank you!
Here is the failure log:
###
2021-10-22 00:35:39 [ndbd] WARNING -- Watchdog: Warning overslept 471 ms, expected 100 ms.
2021-10-22 00:35:39 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 376 (ms), exec_time 12 us, sys_time 0 us
2021-10-22 17:10:33 [ndbd] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2021-10-22 17:10:33 [ndbd] INFO -- findNeighbours from: 5950 old (left: 2 right: 2) new (65535 65535)
2021-10-22 17:10:33 [ndbd] ALERT -- Network partitioning - arbitration required
2021-10-22 17:10:33 [ndbd] INFO -- President restarts arbitration thread [state=7]
2021-10-22 17:10:33 [ndbd] ALERT -- Arbitration won - positive reply from node 1
2021-10-22 17:10:33 [ndbd] INFO -- NR Status: node=2,OLD=Initial state,NEW=Node failed, fail handling ongoing
2021-10-22 17:10:33 [ndbd] INFO -- Master takeover started from 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Started failure handling for node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Starting take over of node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION, NF_BLOCK_HANDLE.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: GCP completion 30633610/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
start_resend(0, 2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 2 waiting for NF_TAKEOVER.
empty bucket (30633610/1 30633610/0) -> active
2021-10-22 17:10:33 [ndbd] INFO -- Started arbitrator node 1 [ticket=be200068e2bfb284]
2021-10-22 17:10:33 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart ongoing
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completed take over of failed node 2
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 2 complete.
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completing GCP 30633610/1 on node failure takeover completion.
2021-10-22 17:10:35 [ndbd] INFO -- NR Status: node=2,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2021-10-22 17:10:35 [ndbd] INFO -- Node 2 has completed node fail handling
2021-10-22 17:10:36 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart finished
job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!
job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
2021-10-22 17:14:11 [ndbd] INFO -- Received signal 6. Running error handler.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(handler_error+0xab) [0x4f04ab]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x80678f]
ndbmtd(handler_error+0x100) [0x4f0500]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
2021-10-22 17:14:11 [ndbd] INFO -- Signal 6 received; Aborted
2021-10-22 17:14:11 [ndbd] INFO -- /export/home2/pb2/build/sb_1-39758149-1592609781.29/rpm/BUILD/mysql-cluster-gpl-8.0.21/mysql-cluster-gpl-8.0.21/storage/ndb/src/kernel/ndbd.cpp
2021-10-22 17:14:11 [ndbd] INFO -- Error handler signal shutting down system
2021-10-22 17:14:11 [ndbd] INFO -- Error handler shutdown completed - exiting
2021-10-22 17:14:11 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2021-10-22 19:22:59 [ndbd] INFO -- Angel pid: 26660 started child: 26661
###
I was doing a deletion of about 180 thousand records when got the cluster failure,
Is it because of the lack of hardware resource or I can do some configuration to prevent it from happening again?
Please help, thank you!
Here is the failure log:
###
2021-10-22 00:35:39 [ndbd] WARNING -- Watchdog: Warning overslept 471 ms, expected 100 ms.
2021-10-22 00:35:39 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 376 (ms), exec_time 12 us, sys_time 0 us
2021-10-22 17:10:33 [ndbd] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2021-10-22 17:10:33 [ndbd] INFO -- findNeighbours from: 5950 old (left: 2 right: 2) new (65535 65535)
2021-10-22 17:10:33 [ndbd] ALERT -- Network partitioning - arbitration required
2021-10-22 17:10:33 [ndbd] INFO -- President restarts arbitration thread [state=7]
2021-10-22 17:10:33 [ndbd] ALERT -- Arbitration won - positive reply from node 1
2021-10-22 17:10:33 [ndbd] INFO -- NR Status: node=2,OLD=Initial state,NEW=Node failed, fail handling ongoing
2021-10-22 17:10:33 [ndbd] INFO -- Master takeover started from 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Started failure handling for node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Starting take over of node 2
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION, NF_BLOCK_HANDLE.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 2 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: GCP completion 30633610/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
start_resend(0, 2021-10-22 17:10:33 [ndbd] INFO -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 2 waiting for NF_TAKEOVER.
empty bucket (30633610/1 30633610/0) -> active
2021-10-22 17:10:33 [ndbd] INFO -- Started arbitrator node 1 [ticket=be200068e2bfb284]
2021-10-22 17:10:33 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart ongoing
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completed take over of failed node 2
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 2 complete.
2021-10-22 17:10:35 [ndbd] INFO -- DBTC 0: Completing GCP 30633610/1 on node failure takeover completion.
2021-10-22 17:10:35 [ndbd] INFO -- NR Status: node=2,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2021-10-22 17:10:35 [ndbd] INFO -- Node 2 has completed node fail handling
2021-10-22 17:10:36 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart finished
job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!
job buffer full
Dumping non-empty job queues:
job buffer 0 --> 2, used 31 FULL!
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
2021-10-22 17:14:11 [ndbd] INFO -- Received signal 6. Running error handler.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(handler_error+0xab) [0x4f04ab]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x89fc7d]
ndbmtd(ndb_print_stacktrace()+0x52) [0x84a0e2]
ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x80678f]
ndbmtd(handler_error+0x100) [0x4f0500]
/lib64/libc.so.6(+0x36400) [0x7fb433e6a400]
/lib64/libc.so.6(gsignal+0x37) [0x7fb433e6a387]
/lib64/libc.so.6(abort+0x148) [0x7fb433e6ba78]
ndbmtd() [0x87041c]
ndbmtd() [0x874f7e]
ndbmtd(SimulatedBlock::sendSignal(unsigned int, unsigned short, Signal*, unsigned int, JobBufferLevel) const+0x195) [0x866915]
ndbmtd(Dbtc::releaseAndAbort(Signal*, Dbtc::ApiConnectRecord*)+0x144) [0x6821f4]
ndbmtd(Dbtc::abort015Lab(Signal*, Ptr<Dbtc::ApiConnectRecord>)+0x299) [0x695e69]
ndbmtd(Dbtc::execCONTINUEB(Signal*)+0x931) [0x6b92b1]
ndbmtd() [0x870c68]
ndbmtd() [0x875413]
ndbmtd(mt_job_thread_main+0x249) [0x87a009]
ndbmtd() [0x848308]
/lib64/libpthread.so.0(+0x7ea5) [0x7fb43550aea5]
/lib64/libc.so.6(clone+0x6d) [0x7fb433f328dd]
2021-10-22 17:14:11 [ndbd] INFO -- Signal 6 received; Aborted
2021-10-22 17:14:11 [ndbd] INFO -- /export/home2/pb2/build/sb_1-39758149-1592609781.29/rpm/BUILD/mysql-cluster-gpl-8.0.21/mysql-cluster-gpl-8.0.21/storage/ndb/src/kernel/ndbd.cpp
2021-10-22 17:14:11 [ndbd] INFO -- Error handler signal shutting down system
2021-10-22 17:14:11 [ndbd] INFO -- Error handler shutdown completed - exiting
2021-10-22 17:14:11 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2021-10-22 19:22:59 [ndbd] INFO -- Angel pid: 26660 started child: 26661
###