Skip to content

Additional heartbeat in Safekit

split-brain vs additional heartbeat

The split-brain situation in a mirror module with file replication is not good. Indeed, the sacrifice of the former secondary server causes file reintegration of this server from the primary one and the loss of data stored on the secondary during the split-brain situation.

For this reason, 2 heartbeats on two physically separate networks are recommended. Typically, a cable between the two servers will allow (1) to avoid split brain with an additional heartbeat network and (2) set the replication flow on a separate network

      <heart>
         <heartbeat>
            <server addr="172.26.76.113"/>
            <server addr="172.26.76.114"/>
         </heartbeat>

         <heartbeat ident="flow">
            <server addr="172.26.32.61"/>
            <server addr="172.26.33.62"/>
         </heartbeat>
      </heart>


      <rfs nbthread="12" nfsbox_options="nocross">
         <flow>
            <server addr="172.26.32.61"/>
            <server addr="172.26.33.62"/>
         </flow>

13.3 Heartbeats ([heart], [heartbeat] tags) Heartbeats must be used only for mirror architecture.

The basic mechanism for synchronizing two servers and detecting server failures is the heartbeat, which is a monitoring data flow on a network shared by a pair of servers. Normally, there are as many heartbeats as there are networks shared by the two servers. In normal operation, the two servers exchange their states (PRIM, SECOND, the Userconfig.xml for a module configuration 39 A2 11LT 25 223 resource states) through the heartbeat mechanism and synchronizes their application start and stop procedures. If all heartbeats are lost, it is interpreted as if the other server is down, and the local server switches to the ALONE state. Although not mandatory, it is better to have two heartbeat channels on two different networks to synchronise the two servers to avoid the split-brain case.

ident="flow" is a reserved name associated with a heartbeat declared on a replication flow. If you set a heartbeat with ident="flow", automatically the replication flow will be set on the same network. If you set ident="flow" without [rfs] configuration, the module starts blocks in WAIT state.