6/08/2009

Disaster Recovery setup


I'm trying to setup a Redhat Cluster that could be site disaster proof.

To illustrate this, see the diagram.

The nodes would service databases instances (not in a RAC fashion), or NFS services, each service running on one node at a time.

The principles would be to use Clustered LVM (CLVM) and LVM mirroring on top of it (though it does not yet support online resizing, and it still needs a 3rd device to carry the metadata).

The main question about this kind of setup (with RHCS) is about the quorum in case one of the 2 sites were to go down, the other site not having anymore the majority, the whole cluster would go down.

We are already managing a few hundreds of clusters this way using non RH (HP ServiceGuard) clustering software and it handles the site loss scenario by provinding a "out-of-cluster" tie-breaker on a 3rd site which guarantees a split-brain proof setup.