Quantcast
Channel: MySQL Forums - NDB clusters
Viewing all articles
Browse latest Browse all 1560

DB down - Missed HeratBeat - Bonding (no replies)

$
0
0
Hello

I faced 1 problem with my ndbcluser.
My DB went unreachable due to missed heatbeat.
I saw that at 03:15am all the Nodes faced some missed Heartbeat.

2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 2
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 3

I noticed that at this time 2 switches rebooted which leads to disable some eth interfaces.

Bonding (prim/backup) is used for the interfaces. So at 03:15am the "backup" interface went up.

Jan 14 03:15:05 x kernel: tg3 0000:03:00.0 eno1: Link is down
Jan 14 03:15:05 x kernel: tg3 0000:03:00.1 eno2: Link is down
Jan 14 03:15:05 x kernel: bond0: link status definitely down for interface eno1, disabling it
Jan 14 03:15:05 x kernel: bond1: link status definitely down for interface eno2, disabling it
Jan 14 03:15:05 x kernel: bond0: making interface eno3 the new active one
Jan 14 03:15:05 x kernel: bond1: making interface eno4 the new active one

When the faulty interfaces went up i noticed again some missed heartbeat:( at 03:30am:

Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex

Jan 14 03:30:03 db1site1 kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex


The ndbCluster is sensitive to the interface switch during the bonding action??

Thanks

Viewing all articles
Browse latest Browse all 1560

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>