Sunday, 11 December 2022

What is split Brain syndrome ?

 Split brain syndrome is a term used to describe a situation that can occur in Oracle Real Application Clusters (RAC) when there is a communication failure between nodes. In a split brain scenario, the nodes in the cluster become isolated from each other and are unable to communicate or coordinate their actions.

In a RAC environment, split brain syndrome can cause serious problems if it is not addressed quickly. Because the nodes are unable to communicate with each other, they may start to act independently, which can lead to data inconsistencies and even data corruption. For example, two nodes may write to the same data block at the same time, resulting in conflicting data being written to the shared storage.

To prevent split brain syndrome, RAC uses a mechanism called I/O fencing, also known as split-brain protection. I/O fencing blocks access to shared storage by a node that is suspected to be out of sync with the rest of the cluster, effectively "fencing" it off from the rest of the cluster. This prevents the node from accessing shared data and helps to prevent data corruption and other issues that can arise from a split brain scenario.

How to resolve Split brain syndrome ?

To resolve split brain syndrome in an Oracle Real Application Clusters (RAC) system, the first step is to restore communication between the nodes. This may involve troubleshooting the network connectivity between the nodes, replacing faulty network hardware, or taking other steps to restore the communication between the nodes.

Once the communication between the nodes has been restored, the cluster manager (CRS) will automatically perform a reconciliation process to ensure that the nodes are in sync with each other and that any data inconsistencies or conflicts have been resolved. This may involve resolving any conflicts between data blocks that were modified independently by different nodes during the period of split brain.

If the split brain syndrome was caused by a failure of the cluster manager (CRS) itself, you may need to restart the CRS on all nodes in order to restore its functionality. This can be done using the crsctl start crs command.

Overall, the steps for resolving split brain syndrome in RAC will depend on the specific circumstances of the situation and the underlying cause of the problem. It's generally recommended to seek the guidance of a qualified Oracle DBA or other experienced professional when dealing with split brain syndrome.

No comments:

Post a Comment