8.1 Understanding Cluster Settings

8.1.1 Cluster Policies

Cluster policies are configuration settings that determine how a cluster is accessed, how many nodes are needed to form a quorum on startup, and if email notifications are sent. You can manage cluster policies in iManager by going to Clusters > My Clusters, selecting a cluster, then selecting Action > Edit Properties > Policies. Table 8-1 describes the configurable cluster policies:

Table 8-1 Cluster Policies

Property

Description

Cluster IP address

Specifies the IP address for the cluster.

You specify the IP address when you install Cluster Services on the first node of the cluster. The IP address is bound to the master node and remains with the master node regardless of which server is the master node.

Rarely, you might need to modify this value. See Section 8.12, Viewing or Modifying the Cluster Master IP Address or Port.

IMPORTANT:If you modify the master IP address, you must restart the cluster, exit iManager, then re-launch iManager before iManager can continue to manage the cluster.

Port

Specifies the port used for cluster communication.

The default cluster port number is 7023, and is automatically assigned when the cluster is created. You might need to modify this value if there is a port conflict. You can change the port number to any other value that does not cause a conflict. See Section 8.12, Viewing or Modifying the Cluster Master IP Address or Port.

Quorum membership - Number of nodes

Specifies number of nodes that must be up and running in the cluster before resources begin to load.

Specify a value between 1 and the number of nodes. Set this value to a number greater than 1 so that all resources are not automatically loaded on the first server that is brought up in the cluster. For example, if you set the membership value to 4, there must be four serves up in the cluster before any resource will load and start.

For instructions, see Section 8.4, Configuring Quorum Membership and Timeout Policies.

Quorum timeout

Specifies the maximum amount of time to wait for the specified quorum to be met. If the timeout period elapses before a quorum is achieved, resources automatically begin loading on whatever number of nodes are actually up and running in the cluster.

You can specify the time in seconds or minutes.

For instructions, see Section 8.4, Configuring Quorum Membership and Timeout Policies.

Email notification

Enables or disables email notification for the cluster. If it is enabled, you can specify up to eight administrator email addresses for cluster events notification.

Specifies the type of cluster events for notifications. You can receive notifications for only critical events such as node failure or a resource going comatose, or you can receive notifications for all cluster state changes and resource state changes.

Specifies whether to receive messages in XML format. XML format messages can be interpreted with a parser and formatted to customize the message information for your specific needs.

For instructions, see Section 8.6, Configuring Cluster Event Email Notification.

8.1.2 Cluster Priorities

Cluster priorities determine the load priority of individual cluster resources. You can manage cluster resource priorities in iManager by going to the Clusters > My Clusters, selecting a cluster, then selecting Action > Edit Properties > Priorities. See Section 10.11, Configuring Resource Priorities for Load Order.

8.1.3 Cluster Protocols

Table 8-2 describes the configurable cluster protocols properties that govern inter-node communication transmission and tolerances. You can manage cluster protocols policies in iManager by going to the Clusters > My Clusters, selecting a cluster, then selecting Action > Edit Properties > Protocols. See Section 8.5, Configuring Cluster Protocols.

Table 8-2 Cluster Protocols

Property

Description

Heartbeat

Specifies the interval of time in seconds between signals sent by each of the non-master nodes in the cluster to the master node to indicate that it is alive. The default is 1 second.

Tolerance

Specifies the maximum amount of time in seconds that a master node waits to get an alive signal from a non-master node before considering that node to have failed and removing it from the cluster. The default is 8 seconds.

Master watchdog

Specifies the interval of time in seconds between alive signals sent from the master node to non-master nodes to indicate that it is alive. The default is 1 second.

Modify this parameter setting only when supervised by Micro Focus Technical Support.

Slave watchdog

Specifies the maximum amount of time in seconds that the non-master nodes wait to get an alive signal from the master node before considering that the master node has failed, assigning another node to become the master node, and removing the old master node from the cluster. The default is 8 seconds.

Modify this parameter setting only when supervised by Micro Focus Technical Support.

Maximum retransmits

This value is set by default and should not be changed. The default is 30.

8.1.4 RME Groups

The Resource Mutual Exclusion (RME) Groups page allows you to view or define sets of resources that must not run on the same node at the same time. You can manage RME Groups in iManager by going to the Clusters > My Clusters, selecting a cluster, then selecting Action > Edit Properties > RME Groups. See Section 10.12, Configuring Resource Mutual Exclusion Groups.

8.1.5 BCC

If OES Business Continuity Clustering (BCC) is installed in the cluster, the BCC properties page allows you to view or change policies for the selected cluster. You can manage BCC policies in iManager by going to Clusters > My Clusters, selecting a cluster, then selecting Action > Edit Properties > BCC. See the Novell Business Continuity Clustering website.

8.1.6 Cascade Failover Prevention

OES Cluster Services added the cascade failover prevention function that detects if a node has failed because of a bad cluster resource and prevents that bad resource from failing over to other servers in the cluster. It is enabled by default. See Section 8.7, Configuring Cascade Failover Prevention.

8.1.7 Monitoring the eDirectory Daemon

Some clustered services rely on the eDirectory daemon (ndsd) to be running and available in order to function properly. NCS provides the ability to monitor the status of the eDirectory daemon (ndsd) at the NCS level. It is disabled by default. The monitor can be set independently on each node. On a node, if the eDirectory daemon does not respond to a status request within a specified timeout period, NCS can take one of three configurable actions: an ndsd restart, a graceful node restart, or a hard node restart. See Section 8.8, Configuring NCS to Monitor the eDirectory Daemon (ndsd).

8.1.8 Cluster Node Reboot Behavior

The OES Cluster Services reboot behavior conforms to the kernel panic setting for the Linux operating system. By default the kernel panic setting is set for no reboot after a node shutdown. On certain occasions, you might want to prevent a downed cluster node from rebooting so you can troubleshoot problems. To control the cluster node reboot behavior, you can use the directive kernel.panic in the Linux /etc/sysctl.conf file to prevent a reboot, or to allow an automatic reboot and to specify the number of seconds to delay the reboot. See Section 8.9, Configuring the Cluster Node Reboot Behavior.

8.1.9 STONITH

The STONITH (shoot-the-other-node-in-the-head) capability allows OES Cluster Services to kill a suspect node by using remote power control. Unlike a poison pill, it does not require a response from the suspect node. STONITH is used after a poison pill is issued; it does not replace the poison pill. See Section 8.10, Configuring STONITH.

8.1.10 Cluster Information

After you install OES Cluster Services, you use the Open Enterprise Server > OES Install and Configuration tool in YaST to set up the cluster or to add a node to an existing cluster. The YaST-based configuration is not used to modify the settings for an existing cluster. For information about modifying the settings for an existing cluster, see: