Hello
I faced 1 problem with my ndbcluser.
My DB went unreachable due to missed heatbeat.
I saw that at 03:15am all the Nodes faced some missed Heartbeat.
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 2
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 3
I noticed that at this time 2 switches rebooted which leads to disable some eth interfaces.
Bonding (prim/backup) is used for the interfaces. So at 03:15am the "backup" interface went up.
Jan 14 03:15:05 x kernel: tg3 0000:03:00.0 eno1: Link is down
Jan 14 03:15:05 x kernel: tg3 0000:03:00.1 eno2: Link is down
Jan 14 03:15:05 x kernel: bond0: link status definitely down for interface eno1, disabling it
Jan 14 03:15:05 x kernel: bond1: link status definitely down for interface eno2, disabling it
Jan 14 03:15:05 x kernel: bond0: making interface eno3 the new active one
Jan 14 03:15:05 x kernel: bond1: making interface eno4 the new active one
When the faulty interfaces went up i noticed again some missed heartbeat:( at 03:30am:
Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 14 03:30:03 db1site1 kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
The ndbCluster is sensitive to the interface switch during the bonding action??
Thanks
I faced 1 problem with my ndbcluser.
My DB went unreachable due to missed heatbeat.
I saw that at 03:15am all the Nodes faced some missed Heartbeat.
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 2
2019-01-14 03:15:08 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 2
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 50 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 51 missed heartbeat 3
2019-01-14 03:15:09 [MgmtSrvr] WARNING -- Node 2: Node 53 missed heartbeat 3
I noticed that at this time 2 switches rebooted which leads to disable some eth interfaces.
Bonding (prim/backup) is used for the interfaces. So at 03:15am the "backup" interface went up.
Jan 14 03:15:05 x kernel: tg3 0000:03:00.0 eno1: Link is down
Jan 14 03:15:05 x kernel: tg3 0000:03:00.1 eno2: Link is down
Jan 14 03:15:05 x kernel: bond0: link status definitely down for interface eno1, disabling it
Jan 14 03:15:05 x kernel: bond1: link status definitely down for interface eno2, disabling it
Jan 14 03:15:05 x kernel: bond0: making interface eno3 the new active one
Jan 14 03:15:05 x kernel: bond1: making interface eno4 the new active one
When the faulty interfaces went up i noticed again some missed heartbeat:( at 03:30am:
Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 14 03:30:03 x kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 14 03:30:03 db1site1 kernel: tg3 0000:03:00.1 eno2: Link is up at 1000 Mbps, full duplex
The ndbCluster is sensitive to the interface switch during the bonding action??
Thanks