Chap 2 Cluster planning
2.1 High availability
2.2 Cluster nodes
2.2.1 Operating system level
Required OS level for HACMP
PSSP versions for SP installation
Parallel System Support Programs for AIX£¨PSSP£©Vision 3
2.2.2 CPU options
HACMP supported RS/6000 models
2.2.3 Disk storages
2.2.4 Cluster node considerations
2.3 Cluster networks
2.3.1 TCP/IP networks
Integrating HACMP with network services
To ensure the most rapid name resolution of cluster nodes, it is
recommended that you change the default order for name serving so that
/etc/hosts is used first (at least for cluster nodes).
To do this, edit the /etc/netsvc.conf file so that this line appears as shown here:
Hosts=local,nis,bind
Note:
You cannot use DHCP to allocate IP addresses to HACMP cluster nodes.
Clients may use this method, but cluster nodes cannot.
Special network considerations
SP
Switch is a high-speed packet switching network, running on the RS/6000
SP system only. It runs bidirectionally up to 300 MBps.
2.3.2 Non-TCP/IP networks
Serial (RS232)
Configuration of RS232 port should be as follows:
_ BAUD rate: 9600 bps
_ PARITY: none
_ BITS per character: 8
_ Number of STOP BITS: 1
_ Enable LOGIN: disable
2.4 Cluster disks
2.4.1 SSA disks(Serial Storage Architecture)
There are two types of SSA disk subsystems for RS/6000 available:
7131 SSA Multi-Storage Tower Model 405
7133 Serial Storage Architecture (SSA) Disk Subsystem Models 010, 500,020, 600, D40, and T40.
SSA adapters
Note: You cannot have more than four adapters in a single system.
Rules for SSA loops
RAID on the 7133 Disk Subsystem
The only RAID level supported by the 7133 SSA Disk Subsystem is RAID 5.
RAID 0 and RAID 1 can be achieved with the striping and mirroring facility of the Logical Volume Manager (LVM).
Advantages
2.4.2 SCSI disks
the 7135 RAIDiant Array (Model 110 and 210)
2.4.3 Fibre Channel adapters
AIX and HACMP supports the Gigabit Fibre Channel Adapters like FC6227 and FC6228 (for 64-bit PCI Bus).
ESS technology
2.5 Resource planning
2.5.1 Resource group options
Cascading resource groups
Inactive Takeover
Cascading Without Fallback (CWOF)
You
can use cascading configuration in case you have servers (nodes) in a
different configuration, for example, the first node has better
performance and many more resources (memory, processors and so on) than
the second node.
Rotating resource groups
Concurrent resource groups
The
resources that can be part of a concurrent resource group are limited
to volume groups with raw logical volumes, raw disks, and application
servers.
2.5.2 Shared LVM components
If a 7135 RAIDiant
Array Subsystem is used for storage, you can have a maximum of four
nodes concurrently accessing a set of storage resources. If you are
using the 7133 SSA Disk Subsystem, you can have up to eight nodes
concurrently accessing it.
2.5.3 IP address takeover
Networks
Network name
Network attribute Public Private Serial
Network adapters
Adapter label
For TCP/IP networks, the adapter label is the name in the /etc/hosts file associated with a specific IP address.
Adapter function
Service adapter
Standby adapter
A node can have no standby adapter, or it can have from one to seven standby adapters for each network to which it connects.
In
an HACMP for AIX environment on the RS/6000 SP, for an IP address
takeover configuration using the SP switch, standby adapters are not
required.
Boot adapter
IP address takeover is an AIX facility
that allows one node to acquire the network address of another node in
the cluster. To enable IP address takeover, a boot adapter label
(address) must be assigned to the service adapter on each cluster node.
Nodes use the boot label after a
system reboot and before the HACMP for AIX software is started.
When
the HACMP for AIX software is started on a node, the node's service
adapter is re-configured to use the service label (address) instead of
the boot label. If the node should fail, a takeover node acquires the
failed node's service address on its standby adapter, thus making the
failure transparent to clients using that specific service address.
During
the reintegration of the failed node, which comes up on its boot
address, the takeover node will release the service address it acquired
from the failed node.
The boot address does not use a separate
physical adapter, but instead is a second name and IP address
associated with a service adapter.
Defining hardware addresses
The hardware address swapping facility works in tandem with IP address takeover.
Hardware
address swapping maintains the binding between an IP address and a
hardware address, which eliminates the need to flush the ARP cache of
clients after an IP address takeover.
This facility, however, is supported only for Ethernet, Token-Ring, and FDDI adapters. It does not work with the SP Switch.
If
you do not use Hardware Address Takeover, the ARP cache of clients can
be updated by adding the clients' IP addresses to the PING_CLIENT_LIST
variable in the /usr/sbin/cluster/etc/clinfo.rc file.
2.5.4 NFS exports and NFS mounts
Filesystems/Directories to Export
Filesystems/Directories to NFS mount
Network for NFS mount
2.6 Application planning
To
put the application under HACMP control, you create an application
server cluster resource that associates a user-defined name with the
names of specially written scripts to start and stop the application.
2.6.1 Performance requirements
2.6.2 Application startup and shutdown routines
2.6.3 Licensing methods
floating licenses
node-locked licenses
2.6.4 Coexistence with other applications
2.6.5 Critical/non-critical prioritization
2.7 Customization planning
2.7.1 Event customization
Note: HACMP without customization will cover only three events: node failure, network failure, and adapter failure.
The event customization facility includes the following features:
Event notification
Pre- and post-event processing
Event recovery and retry
Special application requirements
In case of a failover, you can customize events through the definition of pre- and postevents,
to act according to your application's needs.
Event notification
Predictive event error correction
2.7.2 Error notification
2.8 User ID planning
2.8.1 Cluster user and group IDs
For
users of an HACMP for AIX cluster, system administrators must create
duplicate accounts on each cluster node. The user account information
stored in the /etc/passwd file, and in other files stored in the
/etc/security directory, should be consistent on all cluster nodes.
2.8.2 Cluster passwords
2.8.3 User home directory planning
Home directories on shared volumes
NFS-mounted home directories
NFS-mounted home directories on shared volumes
Chap 3 Cluster hardware and software preparation
3.1 Cluster node setup
3.1.1 Adapter slot placement
buses. You can use two different methods to add PCI buses to your system. These two methods are:
Adding secondary PCI buses off the primary PCI bus.
Implementing multiple primary buses.
Secondary PCI bus
PCI-to-PCI bridge chip
lower performance
Multiple primary PCI buses
Integrated adapters
32-bit versus 64-bit PCI slots
Higher-speed adapters use 64-bit slots because they can transfer 64 bits of data for each data transfer phase.
most 64-bit adapters can operate in 32-bit PCI slots
32-bit adapters still operate in 32-bit mode and offer no performance advantage in a 64-bit slot.
33 MHz versus 50 MHz 64-Bit PCI slots
3.1.2 rootvg mirroring
Normally,
in an HACMP environment, it is not necessary to think about mirroring
the root volume group, because the node failure facilities of HACMP can
cover for the loss of any of the rootvg physical volumes.
Procedure
The following steps assume the user has rootvg contained on hdisk0 and is
attempting to mirror the rootvg to a new disk: hdisk1.
# extendvg rootvg hdisk1
# chvg -Qn rootvg
# mklvcopy hd1 2 hdisk1 # /home file system
# mklvcopy hd2 2 hdisk1 # /usr file system
# mklvcopy hd3 2 hdisk1 # /tmp file system
# mklvcopy hd4 2 hdisk1 # / (root) file system
# mklvcopy hd5 2 hdisk1 # blv, boot logical volume
# mklvcopy hd6 2 hdisk1 # paging space
# mklvcopy hd8 2 hdisk1 # file system log
# mklvcopy hd9var 2 hdisk1 # /var file system
# mklvcopy hd10opt 2 hdisk1 # /opt file system
If
hd5 consists of more than one logical partition, then, after mirroring
hd5 you must verify that the mirrored copy of hd5 resides on contiguous
physical partitions. This can be verified with the following command:
# lslv -m hd5
If
the mirrored hd5 partitions are not contiguous, you must delete the
mirror copy of hd5 (on hdisk1) and rerun the mklvcopy for hd5, using
the -m option.
# syncvg -v rootvg
# bosboot -a -d /dev/hdisk?
where hdisk? is the first hdisk listed under the "PV" heading after the
command lslv -l hd5 has executed.
# bootlist -m normal hdisk0 hdisk1
# shutdown -Fr
3.1.3 AIX parameter settings
I/O pacing
AIX
users have occasionally seen poor interactive performance from some
applications when another application on the system is doing heavy
input/output.
You do this by setting high- and low-water marks.
If a process tries to write to a file at the high-water mark, it must
wait until enough I/O operations have finished to make the low-water
mark.
Checking network option settings
To ensure that HACMP for AIX requests for memory are handled correctly, you
can check (on every cluster node) the thewall network option.
# no -a | grep thewall
Note:
In AIX Version 4.3.2 and later,For CHRP machines, the default value is
1/2 of realmemory or 1048576 (1 GB). A thewall is a run-time attribute.
A thewall is a run-time attribute.
Editing the /etc/hosts file and name server configuration
# vi /etc/hosts file
127.0.0.1 loopback localhost # loopback (lo0) name/address
192.168.2.13 tucana
192.168.2.79 tucana-boot
192.168.3.2 tucana-stby
192.168.2.14 perseus
192.168.2.80 perseus-boot
192.168.3.3 perseus-stby
cron and NIS considerations
If
your HACMP cluster nodes use NIS services, which include the mapping of
the /etc/passwd file, and IPAT is enabled, users that are known only in
the NIS-managed version of the /etc/passwd file will not be able to
create crontabs.
Editing the /.rhosts file
Make sure that each node's service adapters and boot addresses are listed in the /.rhosts file on each cluster node.
3.2 Network connection and testing
3.2.1 TCP/IP networks
3.3 Cluster disk setup
3.3.1 SSA
3.3.2 SCSI
3.4 Shared LVM component configuration
3.4.1 Creating shared VGs
Creating non-concurrent VGs
#smit mkvg
Creating VGs for concurrent access
3.4.2 Creating shared LVs and file systems
#smit crjfs
3.4.3 Mirroring strategies
3.4.4 Importing to other nodes
Varying off a volume group on the source node
#varyoffvg volume_group_name
Make sure that all the file systems of the volume group have been unmounted;
otherwise, the varyoffvg command will not work.
Importing a volume group onto the destination node
You
can also use the TaskGuide utility for this task. The TaskGuide uses a
graphical interface to guide you through the steps of adding nodes to
an existing volume group.
Importing the volume group onto the
destination nodes synchronizes the ODM definition of the volume group
on each node on which it is imported.
#smit importvg
Changing a volume group's startup status
By
default, a volume group that has just been imported is configured to
automatically become active at system restart. In an HACMP for AIX
environment, a volume group should be varied on as appropriate by the
cluster event scripts. Therefore, after importing a volume group, use
the SMIT Change a Volume Group screen to reconfigure the volume group
so that it is not activated
automatically at system restart.
smit crjfs options
Activate volume group automatically at system restart?
Set this field to no.
A QUORUM of disks required to keep the volume group online?
This field is site-dependent.
3.4.5 Quorum
Quorum
is a feature of the AIX LVM that determines whether or not a volume
group can be placed online using the varyonvg command, and whether or
not it can remain online after a failure of one or more of the physical
volumes in the volume group.
Each physical volume in a volume group has a Volume Group Descriptor Area (VGDA) and a Volume Group Status Area (VGSA):
Quorum at varyon
When
a volume group is brought online using the varyonvg command, VGDA and
VGSA data structures are examined. If more than half of the copies are
readable and identical in content, quorum is achieved, and the varyonvg
command succeeds.
Quorum after varyon
If a write to a physical
volume fails, the VGSAs on the other physical volumes within the volume
group are updated to indicate that one physical volume has failed. As
long as more than half of all VGDAs and VGSAs can be written, quorum is
maintained and the volume group remains varied on. If exactly half or
less than half of the VGDAs and VGSAs are inaccessible, quorum is lost,
the
volume group is varied off, and its data becomes unavailable.
Keep in mind that a volume group can be varied on or remain varied on with one or more of the physical volumes unavailable.
Disabling and enabling quorum
Quorum
checking is enabled by default. Quorum checking can be disabled using
the chvg -Qn vgname command, or by using the smit chvg fast path.
Quorum enabled
With
quorum enabled, a volume group will be forced offline if one or more
disk failures cause a majority of the physical volumes to be
unavailable.
Quorum disabled
With quorum disabled, a volume
group will remain varied on until the last physical volume in the
volume group becomes unavailable.
Forcing a varyon
A volume
group with quorum disabled and one or more physical volumes unavailable
can be "forced" to vary on by using the -f flag with the varyonvg
command.
Quorum in non-concurrent access configurations
quorum provides very little actual protection in non-concurrent access configurations.
Quorum in concurrent access configurations
Quorum must be enabled for an HACMP for AIX concurrent access configuration.
Disabling quorum could result in data corruption.
3.4.6 Alternate method - TaskGuide
The
TaskGuide is a graphical interface that simplifies the task of creating
a shared volume group within an HACMP cluster configuration.
TaskGuide requirements
Before starting the TaskGuide, make sure:
You have a configured HACMP cluster in place.
You are on a graphics capable terminal.
Starting the TaskGuide
You can start the TaskGuide from the command line by typing:
/usr/sbin/cluster/tguides/bin/cl_ccvg or you can use the SMIT interface as
follows:
1. Type smit hacmp.
2. From the SMIT main menu, choose Cluster System Management ->
Cluster Logical Volume Manager ->Taskguide for Creating a Shared
Volume Group. After a pause, the TaskGuide Welcome panel appears.
Chap 4 HACMP installation and cluster definition
This chapter describes issues concerning the actual installation of HACMP
Version 4.4.1 and the definition of a cluster and its resources.
4.1 Installing HACMP
HACMP for AIX 4.4.1 AIX Version 4.3.3.25 or higher
4.1.1 First time install
You must install the HACMP for AIX software on each server machine.
There are a number of filesets involved in an HACMP Installation.
cluster.base
(This is the basic component that has to be installed on all server nodes in the
cluster )
cluster.cspoc
(This
component includes all of the commands and environment for the C-SPOC
utility, the Cluster-Single Point Of Control feature.)
cluster.adt
(This component contains demo clients and their include files, for example, for
building a clinfo client on a non-AIX machine.)
cluster.man.en_US
cluster.man.en_US.haview
This component contains the man pages for HAView
cluster.msg.en_US
cluster.vsm
The
Visual Systems Management Fileset contains Icons and bitmaps for the
graphical management of HACMP Resources, as well as the xhacmpm command:
cluster.hativoli
The filesets needed when you plan to monitor this cluster with Tivoli
cluster.haview
This fileset contains the files for including HACMP cluster views into a Tivoli NetView environment.
cluster.man.en_US.haview.data
cluster.msg.en_US.haview
cluster.taskguides
This is the fileset that contains the TaskGuide for easy creation of shared volume groups
The installable images on the CRM installation media are listed here:
cluster.clvm
This fileset contains the Concurrent Resource Manager (CRM) option
cluster.hc
This fileset contains the Application Heart Beat Daemon. Oracle Parallel
Server is an application that makes use of it
The installation of CRM requires the following software:
bos.rte.lvm.usr.4.3.2.0AIX Run-time Executable
HAView installation notes
HAView requires Tivoli NetView for AIX. Install NetView before installing HAView.
The
HAView fileset includes a server image and a client image. If NetView
is installed using a client/server configuration, the HAView server
image should be installed on the NetView server, and the client image
on the NetView client.
Otherwise, you can install both the HAView client and server images on the NetView server.
Note:
Tivoli
NetView for AIX must be installed on any system where you will install
HAView. If NetView is installed using a client/server configuration,
HAView should be installed on the NetView client; otherwise, install it
on the NetView server node. Also, be aware that the NetView client
should not be configured as a cluster node to avoid NetView's failure
after a failover.
Note:
It is recommended that you install
the HAView components on a node outside the cluster. Installing HAView
outside the cluster minimizes the probability of losing monitoring
capabilities during a cluster node failure.
Install server nodes
1. Insert the installation medium and enter:
# smit install_selectable_all
2.
Enter the device name of the installation medium or Install Directory
in the INPUT device/directory for software field and press Enter.
5. Enter field values as follows:
Enter cluster* or all to install all server and client images, or press F4 for a software listing.
Next, press F7 to select either an image or a module.
Rebooting servers
The final step in installing the HACMP for AIX software is to reboot each server in your HACMP for AIX environment.
4.1.2 Upgrading from a previous version
there
are some things you have to take care of in order to get your existing
cluster back the way you want it after the upgrade is through:
Archive any localized script and configuration files to prevent losing them during an upgrade.
Commit your current HACMP for AIX Version 4.* software (if it is
applied but not committed) so that the HACMP for AIX 4.4.1 software can
be installed over the existing version.
If not, run the smit install_commit utility before installing the Version 4.4.1 software
Make a mksysb backup on each node. This saves a backup of the AIX root volume group.
Save the current configuration, using the cluster snapshot utility, and
save any customized event scripts in a directory of your own.
Note:
Although your objective in performing a migration installation is to
keep the cluster operational and to preserve essential configuration
information, do not run your cluster with mixed versions of the HACMP
for AIX software for an extended period of time.
Upgrading from 4.2.2, 4.3.0, or 4.3.1 to 4.4.1 version
Note: Earlier software versions are no longer supported and should be removed prior to upgrading to HACMP for AIX Version 4.4.1.
If
you plan to upgrade from HACMP Versions 4.2.2 through 4.3.1 to Version
4.4.1, you must perform a migration installation of AIX to upgrade to
AIX Version 4.3.3.25 or later on all cluster nodes.
Note: You
cannot migrate directly from HACMP software Version 4.2.2 or 4.3.0 to
Version 4.4.1. To migrate to version 4.4.1, you must first do an
upgrade from Version 4.2.2 or 4.3.0 to Version 4.3.1.
Upgrade AIX on one node
1. If you wish to save your cluster configuration,
2. Shut down the first node (gracefully with takeover) using the smit clstop fast path.
3.
Perform a Migration Installation on Node A. The Migration Installation
option preserves the current version of the HACMP for AIX software and
upgrades the existing base operating system to AIX 4.3.3. Product
(application) files and configuration data are also saved.
4. Check the Migration Installation.
Verify that all the disks are available. Run the lppchk -v and oslevel commands to ensure that the system is in a stable state.
Install HACMP for AIX 4.4.1 on node A
2. The HACMP conversion utilities are cl_convert and clconvert_snapshot.
Upgrading
HACMP software to the newest version involves converting the ODM from a
previous release to that of the current release.
4. Start the HACMP
for AIX Version 4.4.1 software on Node A using the smit clstart fast
path. After HACMP is running, start the previous version of HACMP
software on Node B, if it is not still running. Check to ensure that
the nodes successfully join the cluster.
Important: In a
multi-node cluster, do not synchronize the node configuration or the
cluster topology until the last node has been upgraded.
Check upgraded configuration
1. If using tty devices, check that the tty device is configured as a serial network using the smit chgtty fast path.
2. In order to verify and synchronize the configuration (if desired), you must
have /.rhosts files on cluster nodes. If they do not exist, create the /.rhosts file
on Node A using the following command:
# /usr/sbin/cluster/utilities/cllsif -x >> /.rhosts
3. Verify the cluster topology on all nodes using the clverify utility.
5. Synchronize the node configuration and the cluster topology from Node A to all nodes (this step is optional).
Client-only migration
4.1.3 Migrating from HANFS to HACMP Version 4.4.1
HACMP Version 4.4.1 supports the NFS export behavior of the HANFS cluster.
In order to perform HANFS 4.3.1 to HACMP 4.4.1 node-by-node migration, the following prerequisites must apply:
1. Both nodes in the cluster must have HANFS Version 4.3.1 installed and committed.
2. Both nodes in the cluster must be up and running the HANFS software.
3. The cluster must be in a stable state.
4.1.4 Installing the concurrent resource manager
1. Insert the installation media and enter:
# smit install_selectable_all
4.1.5 Problems during the installation
4.2 Defining cluster topology
4.2.1 Defining the cluster
4.2.2 Defining nodes
4.2.3 Defining networks
Defining networks with automatic network discovery
4.2.4 Defining adapters
Defining adapters for IP-based networks
Defining adapters for no IP-based networks
4.2.5 Configuring network modules
Each
supported cluster network in a configured HACMP cluster has a
corresponding cluster network module. Each network module monitors all
I/O to its cluster network.
Note: The Network Modules are
pre-loaded when you install the HACMP software. You do not need to
enter information in the Network Module SMIT screens unless you want to
change some field associated with a network module, such as the failure
detection rate.
It is highly unlikely that you will add or
remove a network module. For information about changing a
characteristic of a Network Module, such as the failure detection rate
In
HACMP/ES, topology services and group services are used instead of
Network Interface Modules (NIMs) in order to keep track of the status
of nodes, adapters or resources.
4.2.6 Synchronizing the cluster definition across nodes
Synchronization of the cluster topology ensures that the ODM data on
all cluster nodes is in sync. The HACMP ODM entries must be the same on
each node in the cluster.
Note: Even if you have a cluster defined with only one node, you must still synchronize the cluster.
Note:
Before attempting to synchronize a cluster configuration, ensure that
all nodes are powered on, that the HACMP for AIX software is installed,
and that the /etc/hosts and /.rhosts files on all nodes include all
HACMP for AIX boot and service IP labels.
4.3 Defining resources
The
HACMP for AIX software provides a highly available environment by
identifying a set of cluster-wide resources essential to uninterrupted
processing, and then by defining relationships among nodes that ensure
these resources are available to client processes.
4.3.1 Configuring resource groups
Resource Group Name
Node Relationship
Participating Node Names
General considerations for configuring resources
Resource Group Name
Node Relationship
Participating Node Names
Service IP Label
Filesystems (default is All)
Filesystems Consistency Check: fsck (default) or logredo (for fast recovery)
Filesystems
Recovery Method: Identifies the recovery method for the file systems,
parallel (for fast recovery) or sequential (default). Do not set this
field to parallel if you have shared, nested file systems. These must
be recovered sequentially.
Filesystems/Directories to Export
Filesystems/Directories to NFS Mount
Network for NFS Mount (This field is optional.)
Volume Groups
Concurrent Volume Groups
Raw Disk PVIDs
AIX Connections Services
Note: You cannot configure both AIX Connections and AIX Fast Connect in the same resource group.
AIX Fast Connect Resources
Tape resources
Application servers
Highly Available Communications Links
Miscellaneous Data
Automatically Import Volume Groups
By default, Automatically Import Volume Groups flag is set to false.
Inactive Takeover Activated
Cascading without Fallback Enabled
9333 Disk Fencing Activated
SSA Disk Fencing Activated
SSA disk fencing is only available for concurrent access configurations.
Filesystems Mounted Before IP Configured
Configuring run-time parameters
There are two types of run-time parameters that can be chosen for a node.
One
of them is the Debug Level, which can be switched from high (default)
to low, meaning all cluster manager actions are logged, or only errors
are logged, respectively.
The other is the Host uses NIS or Name
Server, if the cluster uses Network Information Services (NIS) or name
serving, set this field to true.
Both of these parameters can be changed while the cluster is running.
Defining application servers
Application servers are another resource that can be configured into a resource group.
They
consist of a (hopefully meaningful) name, in order to enable the
cluster manager to identify the application server uniquely, as well as
the path locations for start and stop scripts for the application.
Synchronizing cluster resources
Note:
All configured nodes must be on their boot addresses when a cluster has
been configured and the nodes are synchronized for the first time.
Any node not on its boot address will not have its /etc/rc.net file updated with the
HACMP for AIX entry; this causes problems for the reintegration of this node
into the cluster.
4.4 Initial testing
4.4.1 clverify
Running /usr/sbin/cluster/diag/clverify is probably a good start to the testing.
Cluster verification is divided into topology and configuration checking.
#smit clverify
If you have configured Kerberos on your system, the clverify utility also verifies that:
All IP labels listed in the configuration have the appropriate service
principals in the .klogin file on each node in the cluster.
All nodes have the proper service principals.
Kerberos is installed on all nodes in the cluster.
All nodes have the same security mode setting.
4.4.2 Initial startup
At this point in time, the cluster is not yet started.
So
the cluster manager has to be started first. To check whether the
cluster manager is up, you can either look for the process with the ps
command:
# ps -ef | grep clstr
or look for the status of the cluster group subsystems:
# lssrc -g cluster
or
look for the status of the network interfaces. If you have IP Address
Takeover (IPAT) configured, you should see that the network interface
is on its boot address with the netstat -i command.
Then start HACMP through smit clstart. In the panel that appears, choose the following parameters and press Enter:
1. start now
2. broadcast message true
3. start cluster lock services false
4. start cluster information daemon true
4.4.3 Takeover and reintegration
When
the cluster is up and running, stop one of the node's cluster managers
with smitty clstop and choose graceful with takeover. One possibility
to check whether the takeover went through smoothly is to look at the
/tmp/hacmp.out file (or in directory where HACMP log files are
redirected) during the takeover, preferably on the takeover node. You
can use the tail -f /tmp/hacmp.out command for this.
After the
cluster has become stable, you might check the netstat -i output again
to verify that the takeover node has acquired the IP address of the
"failed" node.
For cascading resource groups, the failed node is
going to reacquire its resources once it is up and running again. So,
you have to restart HACMP on it through smitty clstart and check again
for the log file, as well as the cluster's status.
4.5 Cluster snapshot
The
cluster snapshot utility allows you to save, in a file, a record of all
the data that defines a particular cluster configuration.
You can use the cluster snapshot files as a cluster configuration and definition backup.
In addition, a snapshot can provide useful information for troubleshooting cluster problems.
Note: You cannot use the cluster snapshot facility in a cluster concurrently running different versions of HACMP for AIX.
What information is saved in a cluster snapshot
The primary information saved in a cluster snapshot is the data stored in the HACMP for AIX ODM classes.
Essentially,
a snapshot saves all the ODM classes HACMP has generated during its
configuration. It does not save user customized scripts, such as start
or stop scripts for an application server.
However, the location
and names of these scripts are in an HACMP ODM class, and are therefore
saved. It is very helpful to put all the customized data in one defined
place, in order to make saving these customizations easier.
Format of a cluster snapshot
The cluster snapshot utility stores the data it saves in two separate files:
ODM Data File .odm file extension.
Cluster State Information File .info file extension.
4.5.1 Applying a cluster snapshot
Applying
a cluster snapshot overwrites the data in the existing HACMP for AIX
ODM classes on all nodes in the cluster with the new ODM data contained
in the snapshot.
You can apply a cluster snapshot from any cluster node.
Note:
A cluster snapshot used for dynamic reconfiguration may contain changes
to either the cluster topology or to cluster resources, but not both.
You cannot change both the cluster topology and cluster resources in a
single dynamic reconfiguration event.
Chap 5 Cluster customization
5.1 Event customization
Whenever
a state change is detected by the cluster manager, it decides which
event will be started. It then executes the script for that event in a
shell, as well as the subevents associated with it. These predefined
events can be found under /usr/sbin/cluster/events.
The HACMP
for AIX software provides an event customization facility that allows
you to tailor event processing to your site. This facility can be used
to include the following types of customization:
Adding, changing, and removing custom cluster events
Pre- and post-event processing
Event notification
Event recovery and retry
5.1.1 Predefined cluster events
Node events
The following sections describe the sequence of node_up and node_down events.
Network events
Network adapter events
Cluster status events
Application monitor events
Other events
5.1.2 Pre- and post-event processing
For
pre-processing, for example, you may want to send a message to specific
users informing them to stand by while a certain event occurs.
5.1.3 Event notification
5.1.4 Event recovery and retry
5.1.5 Notes on customizing event processing
You
must declare a shell (for example, #!/bin/sh) at the beginning of each
script executed by the notify, recovery, and pre- or post-event
processing commands.
5.1.6 Event emulator
The emulation runs on all active nodes in your cluster, and the output is stored in an output file.
5.2 Error notification
Each
time an error is logged in the system error log, the error notification
daemon determines if the error log entry matches the selection
criteria. If it does, an executable is run. This executable, called a
notify method, can range from a simple command to a complex program.
For example, the notify method might be a mail message to the system
administrator or a command to shut down the cluster.
Take the
example of a cluster where an owner node and a takeover node share an
SCSI disk. The owner node is using the disk. If the SCSI adapter on the
owner node fails, an error may be logged, but neither the HACMP for AIX
software nor the AIX Logical Volume Manager responds to the error. If
the error has been defined to the Error Notification facility, however,
an executable that shuts down the node with the failed adapter could be
run, allowing the surviving node to take over the disk.
5.3 Network modules services
The HACMP for AIX SMIT interface allows you to add, remove, or change an HACMP for AIX network module.
however,
you may want to change the failure detection rate of a network module.
Now you can chose between predefined values: slow/normal/fast
If you decide to change the failure detection rate of a network module, keep the following considerations in mind:
Failure detection is dependent on the fastest network linking two nodes.
Faster heartbeat rates may lead to false failure detections, particularly on busy networks.
If your networks are very busy and you experience false failure
detections, you can try changing the failure detection speed on the
network modules to slow to avoid this problem.
For example, for
an Ethernet, the normal failure detection rate is two keepalives per
second; fast is about four per second; slow is about one per second.
For
an HPS network, because no network traffic is allowed when a node joins
the cluster, normal failure detection is 30 seconds; fast is 10
seconds; slow is 60 seconds.
5.4 NFS considerations
The HACMP scripts have only minimal NFS support.
5.4.1 Creating shared volume groups
5.4.2 Exporting NFS file systems and directories
5.4.3 NFS mounting
5.4.4 Cascading takeover with cross mounted NFS file systems
Chap 6 Cluster testing
Check the state of the following components:
Devices
System parameters
Processes
Network adapters
LVM
Cluster
Other items, such as SP Switch, printers, and SNA configuration
6.1 Node verification
6.1.1 Device state
You can use the following ways to verify the state of the device:
Run diag -a in order to clean up the VPD.
Look in the errorlog for unusual errors by issuing the errpt | more or errpt -a | more command.
Check that all devices are in the available state (use lsdev -C | more).
Check that the SCSI addresses of adapters on shared buses are unique (use lsattr -E -l ascsi0).
To check a serial line between two nodes, type stty < /dev/tty# on both nodes
To
test a target-mode SSA heartbeat connection between two nodes, invoke
one of the nodes and listen to the SSA target mode device by issuing
the cat < /dev/tmssa#.tm command on one node; the other node will
then send the SSA target mode device the cat /etc/hosts >
/dev/tmssa#.im command.
This will show the /etc/hosts file on the listing node.
If
you are using target-mode SCSI heartbeat connection between two nodes
the procedure is about the same as for testing target mode SSA. Invoke
one of the nodes to listen to the SCSI target-mode device by issuing
the cat < /dev/tmscsi#.tm (enter twice); the other node sends cat
/etc/hosts > /dev/tmscsi#.im command to the SCSI target-mode device.
This will show the /etc/hosts file on the listing node.
6.1.2 System parameters
You can use the following ways to verify the system parameters:
Type date on all nodes to check that all the nodes in the cluster are running with their clocks on the same time.
Ensure that the number of user licenses has been correctly set (use lslicense).
Check high water mark and other system settings (use smitty chgsys).
Type sysdumpdev -l and sysdumpdev -e to ensure that the dump space is
correctly set and that the primary dump device (lslv hd7) is large
enough to accommodate a dump.
Check that applications to be
controlled by HACMP are not started here, and that extraneous processes
which might interfere with HACMP and/or dominate system resources are
not started (more /etc/inittab).
Check list of cron jobs (crontab -l).
6.1.3 Process state
You can use the following ways to verify the state of the process:
Check the paging space usage by issuing the lsps -a command.
Look for all expected processes with the ps -ef | more command.
Check that the run queue is < 5 and that the CPU usage is at an acceptable level (use vmstat 2 5).
6.1.4 Network state
You can use the following ways to verify the state of the network:
Type, for example, ifconfig lo0, ifconfig en0 and ifconfig en1 to check
the network adapter interface configuration, if you are using Ethernet
adapters.
To check the configuration of an SP Switch adapter, type the command:
# /usr/lpp/ssp/css/ifconfig css0
Use netstat -i or netstat -in to show the network configuration of the node.
To check the alternate Ethernet MAC address, issue the netstat -v ent0 | more command.
Look at mbufs sizing relative to requests for memory denied (use netstat -m | more).
Type netstat -r or netstat -rAn to ensure that there are valid routes to the other cluster node interfaces and to clients.
Run no -a | more and look at the setting of ipforwarding and ipsendredirects.
Verify that all boot, service and standby addressers have the same
subnetmask and that standby addresses will not be on the same subnet as
the boot and service addresser
Check that all services and boot addresses on the same subnet can communicate.
Verify that all standby interfaces on the same subnet can communicate.
Check the status of the TCP/IP daemons (use lssrc -g tcpip).
List the ARP table entries with arp -a.
Ensure that there are no bad entries in the /etc/hosts file, especially at the bottom of the file.
Verify that, if DNS is in use, the DNS servers are correctly defined (use more /etc/resolv.conf).
If NIS or DNS is used, verify that those nodes have "Host uses NIS or Name Server" set to true.
6.1.5 LVM state
6.1.6 Cluster state
You can use the following ways to verify the state of the cluster:
Check the status of the cluster daemons by issuing lssrc -g cluster and lssrc -g lock.
Run /usr/sbin/cluster/clstat to check the status of the cluster and the status of the network interfaces.
Check the cluster log files with tail -f /tmp/hacmp.out, more
/usr/sbin/cluster/history/cluster.mmdd (mmdd = current date), tail -f
/var/adm/cluster.log, and more /tmp/cm.log.
Check that the nodename is correct (use odmget HACMPcluster).
Verify that all the HACMP Configuration is synchronized. This is done by using the clverify command.
To show the clstrmgr version, type the following command:
#snmpinfo -m dump -o /usr/sbin/cluster/hacmp.defs clstrmgr.
6.2 Simulate errors
Note
that the /tmp/hacmp.out file is the most useful to monitor, especially
if the Debug Level of the HACMP Run Time Parameters for the nodes has
been set to high, which is the default, and if the
Application Server Scripts include the set -x command and periodic echo commands.
6.2.1 Adapter failure
Ethernet or token ring interface failure
1. Check that all the nodes in the cluster are up and running.
2. Optional: Prune the error log on austin (use errclear 0).
3. Monitor the cluster log files on boston.
4.
Use the ifconfig command to shut off the appropriate service interface
(but not the Administrative SP Ethernet) on austin (for example,
ifconfig en0 down). This will cause the service IP address to failover
to the standby adapter on austin.
5. Verify that the swap adapter
has occurred (including MAC Address failover if you configured HWAT)
and that HACMP has turned the original service interface back on as the
standby interface.
6. Use the ifconfig command to swap the service
address back to the original service interface back (ifconfig en1
down). This will cause the service IP address to failover back to the
service adapter on austin.
Ethernet or token ring adapter or cable failure
1. Check, by way of the verification commands, that all the nodes in the cluster
are up and running.
2. Optional: Prune the error log on austin (use errclear 0).
3. Monitor the cluster log files on boston.
4.
Disconnect the network cable from the appropriate service interface
(but not the Administrative SP Ethernet) on austin. This will cause the
service IP and MAC addresses to failover to the standby adapter on
austin.
5. Verify that the swap adapter has occurred.
6.
Reconnect the network cable to the service interface. This will cause
the original service interface to become the standby interface.
7.
Initiate a swap adapter back to the original service interface by
disconnecting the network cable from the new service interface
(originally the standby interface). This will cause the service IP and
MAC addresses to failover back to the service adapter on austin.
8. Verify that the swap adapter has occurred.
9. Reconnect the cable to the original standby interface.
10.Verify that the original standby interface is operating with the standby IP address.
Switch adapter failure
This
scenario will show how the cascading Resource Group boston_rg on node
boston will act in a case of failover to standby node austin and also
how the Resource Group boston_rg will fallback to its primary node
boston.
1. Check, by way of the verification commands, that all the nodes in the cluster are up and running.
2. Assign boston to be Eprimary.
3. Optional: Prune the error log on boston (use errclear 0).
4. Monitor the cluster log files on austin.
5. For the status and location of a Resource Group in the cluster, use the clfindres command.
6. Generate the switch error in the error log that is being monitored by HACMP Error Notification
7.
If the first failure simulation method is used, the switch failure will
be detected in the error log (errpt -a | more) on boston and cause a
node failover to austin.
8. Verify that failover has occurred (use
netstat -i and ping for networks, lsvg -o and vi of a test file for
volume groups, ps -U for application processes, and Eprimary for Eprimary).
9. Use the clfindres command to verify that the Resource Group boston_rg
now is located on node Austin
# /usr/sbin/cluster/utilities/clfindres
10.Start
HACMP on boston (use smit clstart;austin will release austin's Resource
Groups and boston will take them back over, but austin (or a lower
alphanumeric node) will remain Eprimary.
11.Verify that
re-integration has occurred (use netstat -i and ping for networks,lsvg
-o and vi of a test file for volume groups, ps -U for application processes, and Eprimary for Eprimary).
12.Again verify, with the clfindres command, that the Resource Group boston_rg has fallen back to its primary node boston,
Failure of a 7133 adapter
6.2.2 Node failure/reintegration
AIX crash
7. Crash austin by entering cat /etc/hosts > /dev/kmem. (The LED on Austin will display 888.)
8. The OS failure on austin will cause a node failover to boston.
9.
Verify that failover has occurred by monitor /tmp/hacmp.out on node
boston and (use netstat -i and ping for networks, lsvg -o and vi of a
test file for volume groups, and ps -U for application processes).
10.The state of cluster can be found by using the /usr/sbin/cluster/clstat command.
12.Power
cycle austin. If HACMP is not configured to start from /etc/inittab,
(on restart) start HACMP on austin (use smit clstart). austin will not
take back its cascading without fallback (CWOF) Resource Groups.
15.To
bring back the Resource Group austin_rg to its primary node austin, we
have to migrate the Resource Group back to its primary node austin. A
Resource Group can be migrated to another node that is part of the
Resource Group if that is required. To migrate Resource Group austin_rg
back to its primary node austin, use the
/usr/sbin/cluster/utilities/cldare command or SMIT HACMP menu fast path
smit cl_resgrp_start.select
CPU failure
TCP/IP subsystem failure
6.2.3 Network failure
6.2.4 Disk failur
6.2.5 Application failure
6.2.6 Configure the process application monitor parameters
6.2.7 Configure the custom application monitor parameters
Chap 7 Cluster troubleshooting
There
are different ways to become aware of the fact that the HACMP cluster
is experiencing problems, for example, by monitoring the cluster with
HATivoli or by verifying the status of the cluster and the substate,
the network state, and the participating nodes in the cluster and their
states.
When an HACMP for AIX script or daemon generates a
message, the message is written to the system console and to one or
more cluster log files. Therefore, you must scan the HACMP log files
for error, messages, triggered event scripts, and other events that not
should not be available in those log files.
7.1 Cluster log files
HACMP for AIX scripts, daemons, and utilities write messages to the log files
/var/adm/cluster.log
/tmp/hacmp.out --AIX scripts. commands
system error log
/usr/sbin/cluster/history/cluster.mmdd
/tmp/cm.log --HACMP for AIX clstrmgr activity.
/tmp/cspoc.log --HACMP for AIX C-SPOC commands.
/tmp/dms_logs.out
/tmp/emuhacmp.out
7.2 config_too_long
If the cluster manager recognizes a state change in the cluster, it acts upon it by executing an event script.
However,
some circumstances, like errors within the script or special conditions
of the cluster, might cause the event script to hang. After a certain
amount of time (by default, 360 seconds), the cluster manager will
issue a config_too_long message into the /tmp/hacmp.out file.
The config_too_long message will continue to be appended into the /tmp/hacmp.out every 30 seconds until action is taken.
The message issued looks like this:
config_too_long 360 $event_name $argument
Where:
_ $event_name is the reconfig event that has failed.
_ $argument are the parameters used by the event.
In
most cases, this is because an event script has failed and then it
often hangs. Use the ps -ef command to find the script's PID and then
terminate it by issuing
#kill -9 PID.
7.3 Deadman switch
The reason to use a DMS is to protect the data on the external disks.
The
deadman switch halts a node when it enters a hung state that extends
beyond a certain time limit. This enables another node in the cluster
to acquire the hung node's resources in an orderly fashion, avoiding
possible problems, in particular for the external shared disks.
1. Tune the system using I/O pacing.
2. Increase the syncd frequency.
3. If needed, increase the amount of memory available for the communications subsystem.
4. Change the Failure Detection Rate.
Advanced Performance Tuning Parameters
#smit cm_configure_menu
7.3.1 Tuning the system using I/O pacing
an initial high-water mark of 33 and a low-water mark of 24 provide a
good starting point, and if you want to change the recommended values, use the
following formula:
High water mark = m * 4 +1
Low water mark = n * 4
Where m and n are non-negative integers.
Note:
If I/O pacing needs to be used in the cluster, it must configured on
each node manually, where I/O pacing is needed because this parameter
is not stored in the HACMP ODM object classes. It does not matter if
I/O pacing is configured through the smit chgsys or smit
cm_configure_menu fast path.
7.3.2 Extending the syncd frequency
The
syncd daemon is responsible for flushing all unwritten system buffers
to disk. By default, it is started automatically at IPL from
/sbin/rc.boot and is invoked by AIX every 60 seconds.
Edit the /sbin/rc.boot file or select Advanced Performance Tuning Parameters through the smit cm_configure_menu fast path,
Note:
If the syncd frequency needs to be increased, this must be done
manually on every node in the cluster, because this parameter is not
stored in the HACMP ODM object classes. It does not matter if the syncd
frequency is configured through the /sbin/rc.boot or the smit
cm_configure_menu fast path. If the syncd frequency is changed by
editing the /sbin/rc.boot, do not forget to run the /sbin/rc.boot to
make the new syncd frequency available.
7.3.3 Increase amount of memory for communications subsystem
If
you experience network performance problems, setting the no option by
using no -o extendednetstats=1 command will give you more information
when using the netstat -am command.
#no -o thewall=624208
7.3.4 Changing the Failure Detection Rate
In
a HACMP cluster, every node is connected to at least one TCP/IP network
and should also have one or more Non-TCP/IP network. Every supported
network has a corresponding Network Interface Module (NIM), which
maintains a connection to the other nodes' NIMs in the cluster.
HACMP
HAS uses the network modules (NIMs) for this purpose. These communicate
their results straight through to the HACMP Cluster Manager. HACMP/ES
uses the facilities of RSCT, namely Topology Services, Group Services,
and Event Management, for its heartbeats
Both the TCP/IP network and non-TCP/IP network is used for sending and receiving keepalive packets, and each NIM,
There are two parameters in ES 4.4,that determine the heartbeat: frequency and sensitivity.
The time that is needed by ES Version 4.4 to detect a failure can be calculated by using following formula:
Frequency * Sensitivity * 2
Tuning the network interface module (NIM)
#smit cm_config_networks
7.4 ES 4.4.0 and later creates entries in AIX error log
RS/6000 Cluster Technology (RSCT)
7.4.1 The topology services subsystem
7.4.2 Missing heartbeat creates entries in AIX error log
7.5 Node isolation and partitioned clusters
Node isolation occurs when all the networks connecting nodes fail but the nodes remain up and running.
One
or more nodes can then be completely isolated from the others. A
cluster in which this has happened is called a partitioned cluster.
7.6 The DGSP message
A
Diagnostic Group Shutdown Partition (DGSP) message is sent when a node
loses communication with the cluster and then tries to re-establish
communication.
7.7 Troubleshooting SSA
7.8 User ID problems
As
the node providing the service can change, the system administrator has
to ensure that the same user and group is known to all nodes
potentially running an application. So, in case one node is failing,
and the application is taken over by the standby node, a user can go on
working, because the takeover node knows that user under exactly the
same user and group ID.
7.9 Troubleshooting strategy
Chap 8Cluster management and administration
Chap 9 Special RS/6000 SP topics
SP system.
9.1 High availability control workstation (HACWS)
9.2 Kerberos security
9.2.1 Configuring Kerberos security with HACMP
9.3 VSDs - RVSDs
VSDs
(Virtual Shared Disks) and RVSDs (Recoverable Virtual Shared Disks) are
SP-specific facilities that you are likely to use in an HACMP
environment.
Virtual Shared Disk (VSD) allows data in logical
volumes on disks physically connected to one node to be transparently
accessed by other nodes.
Recoverable Virtual Shared Disk (RVSD)
adds availability to VSD. RVSD allows you to twin-tail disks, that is,
physically connect the same group of disks to two or more nodes, and
provide transparent failover of VSDs among the nodes. RVSD is a
separately-priced IBM LPP.
9.4 SP switch as an HACMP network
One
of the fascinating things with an RS/6000 SP is the switch network. It
has developed over time, and the current supported type of switch for
HACMP at customer sites is SP Switch, also known as the TB3 switch.
9.4.1 SP Switch support
In
an SP system with the SP Switch, nodes can run different levels of
PSSP, including PSSP Version 3.2; also, the system can be partitioned.
9.4.3 Eprimary management
The
SP Switch has an internal primary backup concept, where the primary
node, known as the Eprimary, is backed up automatically by a backup
node.
Chanp 10 HACMP classic versus HACMP/ES
This will give you some idea of what kind of HACMP version that would perhaps be the solution for your environment.
The following options are referred to as:
HACMP classic
High Availability Subsystem (HAS)
Concurrent Resource Manager (CRM)
High Availability Network File System (HANFS); this is included in HACMP and HACMP/ES since Version 4.4.0
HACMP/ES
Enhanced Scalability (ES)
Enhanced Scalability Concurrent Resource Manager (ESCRM)
10.1 HACMP classic
HAS and CRM
The CRM includes the HAS, which provides distributed locking facility to support access to shared data.
10.2 HANFS
The HANFS for AIX software supports only two nodes in a cluster.
10.3 HACMP/ES and ESCRM
This
was originally only available for SP environment, where tools were
already in place with Parallel Systems Support Program (PSSP) to manage
larger clusters.
HACMP/ES and ESCRM builds on the Event
Management and Group Services facilities of RSCT for AIX to scale HACMP
up to 32 HACMP nodes.
ESCRM optionally adds concurrent shared-access management for supported RAID and SSA disk subsystems.
10.4 Similarities and differences
The
technique of keeping track of the status of a cluster by sending and
receiving heartbeat messages is the major difference between HACMP HAS
and HACMP/ES. HACMP HAS uses the network modules (NIMs) for this
purpose. These communicate their results straight through to the HACMP
Cluster Manager. HACMP/ES uses the facilities of RSCT, namely Topology
Services, Group Services, and Event Management, for its heart beat.