Unexpected shutdown (no replies)

Hi Guys,

I am evaluating Cluster for use in my company. Out of the blue this morning NDB decided to shut it's self down. It did not restart, and I had to manually restart the 'ndbd' processes after realising that they were not running.

This would be very worrying if used in a production environment. I wonder whether anybody could give an opinion of what happened, and how to stop it in the future?

Version 7.2.5

Two data nodes, 4 and 5.

Node 5 logged this line at the time of the failure:

2012-09-10 10:18:17 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Performing Send elapsed=103

From there, both nodes reported Watchdog errors, which concluded with both nodes shutting down. (Logs included at end of positing.)

Any advise of what happened, or how to stop it again?

Regards, Ben.

----------------------------LOGS---------------------------

Data (Node 5)

2012-09-10 10:18:17 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Performing Send elapsed=103
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2506963 System time: 1647468
2012-09-10 10:18:28 [ndbd] INFO -- timerHandlingLab now: 11355449007 sent: 11355448828 diff: 179
2012-09-10 10:18:28 [ndbd] WARNING -- Time moved forward with 10568 ms
2012-09-10 10:18:28 [ndbd] WARNING -- timerHandlingLab now: 11355459420 sent: 11355449007 diff: 10413
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2506967 System time: 1647473
2012-09-10 10:18:28 [ndbd] WARNING -- Watchdog: Warning overslept 10544 ms, expected 100 ms.
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2506967 System time: 1647473
2012-09-10 10:18:28 [ndbd] WARNING -- Watchdog: Warning overslept 578 ms, expected 100 ms.
2012-09-10 10:18:36 [ndbd] INFO -- findNeighbours from: 4861 old (left: 4 right: 4) new (65535 65535)
2012-09-10 10:18:36 [ndbd] INFO -- Arbitrator decided to shutdown this node
2012-09-10 10:18:36 [ndbd] INFO -- QMGR (Line: 5975) 0x00000002
2012-09-10 10:18:36 [ndbd] INFO -- Error handler shutting down system
2012-09-10 10:18:36 [ndbd] INFO -- Error handler shutdown completed - exiting
2012-09-10 10:18:37 [ndbd] ALERT -- Node 5: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

Data (Node 4)

2012-09-10 10:18:28 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=100
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] WARNING -- Watchdog: Warning overslept 2206 ms, expected 100 ms.
2012-09-10 10:18:28 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=100
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=200
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=300
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=401
2012-09-10 10:18:28 [ndbd] INFO -- Watchdog: User time: 2998422 System time: 1847367
2012-09-10 10:18:28 [ndbd] INFO -- Arbitrator decided to shutdown this node
2012-09-10 10:18:28 [ndbd] INFO -- QMGR (Line: 5975) 0x00000002
2012-09-10 10:18:28 [ndbd] INFO -- Error handler shutting down system
2012-09-10 10:18:28 [ndbd] INFO -- Error handler shutdown completed - exiting
2012-09-10 10:18:31 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(
Arbitration error). Temporary error, restart node'.

Management (Node 1):

2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 4: GCP Monitor: GCP_COMMIT lag 0 seconds (max lag: 13)
2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 4: Node 1 missed heartbeat 2
2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 4: Node 1 missed heartbeat 3
2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 4: Node 5 missed heartbeat 2
2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 4: Node 5 missed heartbeat 3
2012-09-10 10:18:27 [MgmtSrvr] WARNING -- Node 5: Node 1 missed heartbeat 2
2012-09-10 10:18:28 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected
2012-09-10 10:18:28 [MgmtSrvr] ALERT -- Node 1: Node 5 Disconnected
2012-09-10 10:18:34 [MgmtSrvr] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2012-09-10 10:18:36 [MgmtSrvr] INFO -- Node 1: Node 5 Connected
2012-09-10 10:18:37 [MgmtSrvr] ALERT -- Node 1: Node 5 Disconnected
2012-09-10 10:18:37 [MgmtSrvr] ALERT -- Node 5: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

Unexpected shutdown (no replies)

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...