NDB data node failed and sent another data node into a crash loop (2 replies)

I have an NDB cluster running 3 data nodes and 3 management nodes.

It appears that an error occured on 1 data node which caused it to restart

Node 19:
2022-01-15 04:20:46 [ndbd] INFO -- findNeighbours from: 2905 old (left: 17 right: 17) new (17 18)
2022-01-15 04:20:47 [ndbd] INFO -- NR Status: node=18,OLD=Node failure handling complete,NEW=All nodes permitted us
2022-01-15 04:20:47 [ndbd] INFO -- Switch to 17 multi trp for node 18
2022-01-15 04:21:24 [ndbd] INFO -- NR Status: node=18,OLD=All nodes permitted us,NEW=Include node in LCP/GCP protocols
2022-01-15 04:21:24 [ndbd] INFO -- NR Status: node=18,OLD=Include node in LCP/GCP protocols,NEW=Synchronize start node with live nodes
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::execCOPY_FRAGREQ(Signal*)+0xa29) [0x65f949]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x230) [0x8a11e0]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7f7b742a3ea5]
/lib64/libc.so.6(clone+0x6d) [0x7f7b72ccb96d]
2022-01-15 04:21:25 [ndbd] INFO -- /var/lib/pb2/sb_1-2918142-1619218179.52/rpm/BUILD/mysql-cluster-com-8.0.25/mysql-cluster-com-8.0.25/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
2022-01-15 04:21:25 [ndbd] INFO -- DBLQH (Line: 19166) 0x00000002 Check getFragmentrec(fragId) failed
2022-01-15 04:21:25 [ndbd] INFO -- Error handler shutting down system
2022-01-15 04:21:26 [ndbd] ALERT -- Node 19: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2022-01-15 04:21:27 [ndbd] INFO -- Angel pid: 34452 started child: 34453

Another data node then failed at the same time and was put into a crash loop

Node 18:
2022-01-15 04:21:25 [ndbd] INFO -- LDM(8): Completed copy of fragment T175F3. Changed +0/-0 rows, 0 bytes. 0 pct churn to 0 rows.
2022-01-15 04:21:26 [ndbd] INFO -- Node 19 disconnected in state: 0
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Qmgr::execDISCONNECT_REP(Signal*)+0x21f) [0x79be7f]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc05]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7efd8a3e9ea5]
/lib64/libc.so.6(clone+0x6d) [0x7efd88e1196d]
2022-01-15 04:21:26 [ndbd] INFO -- Node 19 disconnected in state: 0
2022-01-15 04:21:26 [ndbd] INFO -- Node 19 disconnected in phase: 3
2022-01-15 04:21:26 [ndbd] INFO -- QMGR (Line: 4245) 0x00000002
2022-01-15 04:21:26 [ndbd] INFO -- Error handler shutting down system
2022-01-15 04:21:26 [ndbd] ALERT -- Node 18: Forced node shutdown completed. Occurred during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2022-01-15 04:21:27 [ndbd] INFO -- Angel pid: 14167 started child: 14168

It appears that the 2 nodes were caught in some sort of contention where one would crash and then the other would crash

Node 19:
2022-01-15 13:33:22 [ndbd] INFO -- Node 18 disconnected in state: 0
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Qmgr::failReportLab(Signal*, unsigned short, FailRep::FailCause, unsigned short)+0x96d) [0x7956ad]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc05]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7ff4b634bea5]
/lib64/libc.so.6(clone+0x6d) [0x7ff4b4d7396d]
2022-01-15 13:33:22 [ndbd] INFO -- Node 18 failed
2022-01-15 13:33:22 [ndbd] INFO -- QMGR (Line: 5039) 0x00000002
2022-01-15 13:33:22 [ndbd] INFO -- Error handler shutting down system
2022-01-15 13:33:22 [ndbd] ALERT -- Node 19: Forced node shutdown completed. Occurred during startphase 2. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

Node 18:
22-01-15 13:33:02 [ndbd] INFO -- (16), tab(6,3), lcpNo: 65535, m_max_restorable_gci: 2429, crestartNewestGci: 2430, srStartGci: 0
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
stack_bottom = 0 thread_stack 0x0
stack_bottom = 0 thread_stack 0x0
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
/usr/sbin/ndbmtd(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x8c676d]
/usr/sbin/ndbmtd(ErrorReporter::handleError(int, char const*, char const*, NdbShutdownType)+0x2f) [0x81f59f]
/usr/sbin/ndbmtd(SimulatedBlock::progError(int, int, char const*, char const*) const+0xf9) [0x887ca9]
/usr/sbin/ndbmtd(Dblqh::send_restore_lcp(Signal*)+0x9d9) [0x624b59]
/usr/sbin/ndbmtd() [0x89604c]
/usr/sbin/ndbmtd() [0x89bc98]
/usr/sbin/ndbmtd(mt_job_thread_main+0x4c9) [0x8a1479]
/usr/sbin/ndbmtd() [0x869366]
/lib64/libpthread.so.0(+0x7ea5) [0x7fe8c2222ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fe8c0c4a96d]
2022-01-15 13:33:02 [ndbd] INFO -- /var/lib/pb2/sb_1-2918142-1619218179.52/rpm/BUILD/mysql-cluster-com-8.0.25/mysql-cluster-com-8.0.25/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
2022-01-15 13:33:02 [ndbd] INFO -- DBLQH (Line: 27173) 0x00000002 Check c_local_sysfile.m_max_restorable_gci >= crestartNewestGci failed
2022-01-15 13:33:02 [ndbd] INFO -- Error handler shutting down system
2022-01-15 13:33:02 [ndbd] ALERT -- Node 18: Forced node shutdown completed. Occurred during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Eventually Node 19 was able to correct itself and start properly
Node 18 however was stuck in a crash loop and eventually taken offline

Can you advise on how to recover from this state?

NDB data node failed and sent another data node into a crash loop (2 replies)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112