Oracle9i Real Application Clusters
(RAC) on HP-UX
|
| |
|
|
Authors: Rebecca Kühn, Rainer Marekwia |
Contents:
2. Overview: What is Oracle9i Real Applications Clusters?
3. Oracle 9i Real Application Clusters � Cache Fusion technology
4. New HP Cluster Interconnect technology
5. HP/Oracle Hardware and Software Requirements
5.1 General Notes
5.2 System Requirements
5.3 HP-UX Operating System Patches
5.4 Kernel Parameters
5.5 Asynchronous I/O
6. Configure the HP/Oracle 9i Real Application Cluster
6.1 Hardware configuration (Hardware planning, Network and disk layout)
6.2 Configure logical volumes
6.3 Configure HP ServiceGuard cluster
6.4 Create a user who will own the Oracle RAC software
6.5 Install Oracle Software
6.6 Create a Oracle9i database using DBCA
Purpose
This module focuses on what Oracle9i Real Application Clusters (RAC) is and how it can be properly configured on HP-UX to tolerate failures with minimal downtime. Oracle9i Real Application Clusters is an important Oracle9i feature that addresses high availability and scalability issues.
Objectives
Upon completion of this module, you should be able to:
2. Overview: What is Oracle9i Real Applications Clusters?
Oracle9i Real Application Clusters is a computing environment that harnesses the processing power of multiple, interconnected computers. Oracle9i Real Application Clusters software and a collection of hardware known as a "cluster," unites the processing power of each component to become a single, robust computing environment. A cluster generally comprises two or more computers, or "nodes."
In Oracle9i Real Application Clusters (RAC) environments, all nodes concurrently execute transactions against the same database. Oracle9i Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.
Oracle9i Real Application Clusters serves as an important component of robust high availability solutions. A properly configured Oracle9i Real Application Clusters environment can tolerate failures with minimal downtime.
Oracle9i Real Application Clusters is also applicable for many other system types. For example, data warehousing applications accessing read-only data are prime candidates for Oracle9i Real Application Clusters. In addition, Oracle9i Real Application Clusters successfully manages increasing numbers of online transaction processing systems as well as hybrid systems that combine the characteristics of both read-only and read/write applications.
Harnessing the power of multiple nodes offers obvious advantages. If you divide a large task into sub-tasks and distribute the sub-tasks among multiple nodes, you can complete the task faster than if only one node did the work. This type of parallel processing is clearly more efficient than sequential processing. It also provides increased performance for processing larger workloads and for accommodating growing user populations. Oracle9i Real Application Clusters can effectively scale your applications to meet increasing data processing demands. As you add resources, Oracle9i Real Application Clusters can exploit them and extend their processing powers beyond the limits of the individual components.
From a functional perspective RAC is equivalent to single-instance Oracle. What the RAC environment does offer is significant improvements in terms of availability, scalability and reliability.
In recent years, the requirement for highly available systems, able to scale on demand, has fostered the development of more and more robust cluster solutions. Prior to Oracle9i, HP and Oracle, with the combination of Oracle Parallel Server and HP ServiceGuard OPS edition, provided cluster solutions that lead the industry in functionality, high availability, management and services. Now with the release of Oracle 9i Real Application Clusters (RAC) with the new Cache Fusion architecture based on an ultra-high bandwidth, low latency cluster interconnect technology, RAC cluster solutions have become more scalable without the need for data and application partitioning.
The information contained in this document covers the installation and configuration of Oracle Real Application Clusters in a typical environment; a two node HP cluster, utilizing the HP-UX operating system.
3. Oracle 9i Real Application Clusters � Cache Fusion technology
Oracle 9i cache fusion utilizes the collection of caches made available by all nodes in the cluster to satisfy database requests. Requests for a data block are satisfied first by a local cache, then by a remote cache before a disk read is needed. Similarly, update operations are performed first via the local node and then the remote node caches in the cluster, resulting in reduced disk I/O. Disk I/O operations are only done when the data block is not available in the collective caches or when an update transaction performs a commit operation.
Oracle 9i cache fusion thus provides Oracle users an expanded database cache for queries and updates with reduced disk I/O synchronization which overall speeds up database operations.
However, the improved performance depends greatly on the efficiency of the inter-node message passing mechanism, which handles the data block transfers between nodes.
The efficiency of inter-node messaging depends on three primary factors:
4. New HP Cluster Interconnect technology
5. HP/Oracle Hardware and Software Requirements
For additional information and latest updates please refer to the Oracle9i Release Note Release 1 (
$ /usr/sbin/dmesg | grep "Physical:"
$ /usr/sbin/swapinfo -a (requires root privileges)
$/bin/getconf KERNEL_BITS
$ uname -a
$ cd /usr/lib
$ ln -s /usr/lib/libX11.3 libX11.sl
$ ln -s /usr/lib/libXIE.2 libXIE.sl
$ ln -s /usr/lib/libXext.3 libXext.sl
$ ln -s /usr/lib/libXhp11.3 libXhp11.sl
$ ln -s /usr/lib/libXi.3 libXi.sl
$ ln -s /usr/lib/libXm.4 libXm.sl
$ ln -s /usr/lib/libXp.2 libXp.sl
$ ln -s /usr/lib/libXt.3 libXt.sl
$ ln -s /usr/lib/libXtst.2 libXtst.sl
5.3 HP-UX Operating System Patches
11.0 (64bit):
11i (64bit):
Optional Patch: For DSS applications running on machines with more than 16 CPUs, we recommend installation of the HP-UX patch PHKL_22266. This patch addresses performance issues with the HP-UX Operating System.
HP provides patch bundles at
http://www.software.hp.com/SUPPORT_PLUS
Individual patches can be downloaded from
http://itresourcecenter.hp.com
To determine which operating system patches are installed, enter the following command:
$ /usr/sbin/swlist -l patch
To determine if a specific operating system patch has been installed, enter the following command:
$ /usr/sbin/swlist -l patch patch_number
To determine which operating system bundles are installed, enter the following command:
$ /usr/sbin/swlist -l bundle
|
| ||
|
Kernel Parameter |
Setting |
Purpose |
|
KSI_ALLOC_MAX |
(NPROC * 8) |
Defines the system wide limit of queued signal that can be allocated. |
|
MAXDSIZ |
1073741824 bytes |
Refers to the maximum data segment size for 32-bit systems. Setting this value too low may cause the processes to run out of memory. |
|
MAXDSIZ_64 |
2147483648 bytes |
Refers to the maximum data segment size for 64-bit systems. Setting this value too low may cause the processes to run out of memory. |
|
MAXSSIZ |
134217728 bytes |
Defines the maximum stack segment size in bytes for 32-bit systems. |
|
MAXSSIZ_64BIT |
1073741824 |
Defines the maximum stack segment size in bytes for 64-bit systems. |
|
MAXSWAPCHUNKS |
(available memory)/2 |
Defines the maximum number of swap chunks where SWCHUNK is the swap chunk size (1 KB blocks). SWCHUNK is 2048 by default. |
|
MAXUPRC |
(NPROC + 2) |
Defines maximum number of user processes. |
|
MSGMAP |
(NPROC + 2) |
Defines the maximum number of message map entries. |
|
MSGMNI |
NPROC |
Defines the number of message queue identifiers. |
|
MSGSEG |
(NPROC * 4) |
Defines the number of segments available for messages. |
|
MSGTQL |
NPROC |
Defines the number of message headers. |
|
NCALLOUT |
(NPROC + 16) |
Defines the maximum number of pending timeouts. |
|
NCSIZE |
((8 * NPROC + 2048) + VX_NCSIZE) |
Defines the Directory Name Lookup Cache (DNLC) space needed for inodes. VX_NCSIZE is by default 1024. |
|
NFILE |
(15 * NPROC + 2048) |
Defines the maximum number of open files. |
|
NFLOCKS |
NPROC |
Defines the maximum number of files locks available on the system. |
|
NINODE |
(8 * NPROC + 2048) |
Defines the maximum number of open inodes. |
|
NKTHREAD |
(((NPROC * 7) / 4) + 16) |
Defines the maximum number of kernel threads supported by the system. |
|
NPROC |
4096 |
Defines the maximum number of processes. |
|
SEMMAP |
((NPROC * 2) + 2) |
Defines the maximum number of semaphore map entries. |
|
SEMMNI |
(NPROC * 2) |
Defines the maximum number of semaphore sets in the entire system. |
|
SEMMNS |
(NPROC * 2) * 2 |
Sets the number of semaphores in the system. The default value of SEMMNS is 128, which is, in most cases, too low for Oracle9 i software. |
|
SEMMNU |
(NPROC - 4) |
Defines the number of semaphore undo structures. |
|
SEMVMX |
32768 |
Defines the maximum value of a semaphore. |
|
SHMMAX |
Available physical memory |
Defines the maximum allowable size of one shared memory segment. The SHMMAX setting should be large enough to hold the entire SGA in one shared memory segment. A low setting can cause creation of multiple shared memory segments which may lead to performance degradation. |
|
SHMMNI |
512 |
Defines the maximum number of shared memory segments in the entire system. |
|
SHMSEG |
32 |
Defines the maximum number of shared memory segments one process can attach. |
|
VPS_CEILING |
64 |
Defines the maximum System-Selected Page Size in kilobytes. |
|
Note: These are minimum kernel requirements for Oracle9i. If you have previously tuned your kernel parameters to levels equal to or higher than these values, continue to use the higher values. A system restart is necessary for kernel changes to take effect. | ||
1. Create the /dev/async character device
$ /sbin/mknod /dev/async c 101 0x0
$ chown oracle:dba /dev/async
$ chmod 660 /dev/async
2. Configure the async driver in the kernel using SAM
=> Kernel Configuration
=> Kernel
=> the driver is called 'asyncdsk'
Generate new kernel
Reboot
3. Set HP-UX kernel parameter max_async_ports using SAM. max_async_ports limits the maximum number of processes that can concurrently use /dev/async. Set this parameter to the sum of 'processes' from init.ora + number of bakground processes. If max_async_ports is reached, subsequent processes will use synchronous i/o.
4. Set HP-UX kernel parameter aio_max_ops using SAM. aio_max_ops limits the maximum number of asynchronous i/o operations that can be queued at any time. Set this parameter to the default value (2048), and monitor over time using glance
6. Configure the HP/Oracle 9i Real Application Cluster
6.1 Hardware configuration (Hardware planning, Network and disk layout)
Hardware Planning
In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more systems and two or more independent disk subsystems. This redundancy eliminates single points of failure.
The nodes in an Oracle9i RAC cluster are HP 9000 systems with similar memory configuration and processor architecture. A node can be any Series 800 model. It is recommended that both nodes be of similar processing power and memory capacity.
An RAC cluster must have:
Network and Disk Layout
Draw a diagram of your cluster using information gathered from these two sets of commands. You�ll use this information later in configuring the system, the logical volumes and the cluster.
1. Use the LAN commands
$ lanscan
$ ifconfig lanX, and
$ netstat
to determine the number of LAN interfaces on each node and the names and addresses of each LAN card and subnet information.
2. Use the IO command
$ ioscan �fnCdisk
to find the disks connected to each node. Note the type of disks installed. List the hardware addresses and device file names of each disk. Also note which are shared between nodes.
Network Planning
Minimally, a 9i RAC cluster requires three distinct subnets:
o Dedicated cluster heartbeat LAN
o Dedicated Global Cache Management (GCM) LAN
o User/Data LAN, which will also carry a secondary heartbeat
Because the GCM is now integrated into the Oracle9i kernel, the GCM will use the IP address associated with the default host name.
The network should be configured in the /etc/rc.config.d/netconf file. Any time you change the LAN configuration, you need to stop the network and re-start it again:
$ /sbin/rc2.d/S340net stop
$ /sbin/rc2.d/S340net start
GCM requires a high speed network to handle high bandwidth network traffic. In the Oracle literature this is referred to as the host interconnect. We recommend using either Hyperfabric or Gigabit Ethernet for this network.
Remote copy(rcp) needs to be enabled for both the root and oracle accounts on all nodes to allow remote copy of cluster configuration files.
There are two ways to enable rcp for root. You can choose the one that fits your site�s security requirements. Include the following lines in either the .rhosts file in root�s home directory or in the /etc/cmcluster/cmclnodelist file:
node1name root
node2name root
To enable remote copy (rcp) for Oracle include the following lines in the .rhosts file in the oracle user�s home directory:
node1name oracle
node2name oracle
where node1name and node2name are the names of the two systems in the cluster and oracle is the user name of the Oracle owner. The rcp only works if for the respective user a password has been set (root and oracle).
General Recommendations
When disk drives were 1 or 2-GB at maximum the usual wisdom was to do the following:
· Place redo logs and database files onto different drives
· Ensure that data and indexes were on separate spindles
· Spread the I/O load across as many disk devices as possible
Today with the greatly increased capacity of a single disk mechanism (maximum 181Gb drives on an XP512) and much faster I/O rates or transfer speeds, these rules must be revisited.
The real reason for these rules of thumb was to make sure that the I/O load resulting from an Oracle database would wind up being fairly well spread across all the disk mechanisms. Before the advent of large capacity disk drives housed in high performance storage systems, if the same disk drive wound up hosting two or more fairly active database objects, performance could deteriorate rapidly, especially if any of these objects needed to be accessed sequentially.
Today, in the era of huge disk arrays, the concept of �separate spindles� is a bit more vague, as the internal structure of the array is largely hidden from the view of the system administrator. The smallest independent unit of storage in an XP array is substantially larger than 1 or 2 GB, which means you have far fewer �spindles� to play with, at a time when there are more database objects (tables, indexes, etc) to �spread�, so it won�t be possible to keep all the objects separate. The good news is that the architecture of the XP array is much more tolerant of multiple simultaneous I/O streams to/from the same disk mechanism than the previous generation of individual small disks.
Given all these advances in the technology, we have found it best to use a simple method for laying out an Oracle database on an XP array (under HP-UX) with volume manager striping of all of the database objects across large numbers of disk mechanisms. The result is to average out the I/O to a substantial degree. This method does not guarantee the avoidance of disk hotspots, but we believe it to be a reasonable �first pass� which can be improved upon with tuning over time. It�s not only a lot faster to implement than a customized one-object-at-a-time layout, but we believe it to be much more resistant to the inevitable load fluctuations which occur over the course of a day, month, or year.
The layout approach that we are advocating might be described as �Modified Stripe-Everything- -Across-Everything�. Our goal is to provide a simple method which will yield good I/O balance, yet still provide some means of manual adjustment. Oracle suggests the same strategy. Their name for this strategy is SAME (Stripe and Mirror Everything).
XP basics: an XP512 can be configured with one to four pairs of disk controller modules (ACPs). Each array group is controlled by only one of these ACP pairs (it is in the domain of only one ACP pair). Our suggestion is that you logically �separate� the XP�s array groups into four to eight sets. Each set should have array groups from all the ACP domains. Each set of array groups would then be assigned to a single volume group. All LUNs in the XP array will have paths defined via two distinct host-bus adapters; the paths should be assigned within each volume group in such a fashion that their primary path alternates back and forth between these two host-bus adapters. The result of all this: each volume group will consist of space which is �stripable� across multiple array groups spread across all the ACP pairs in the array, and any I/O done to these array groups will be spread evenly across the host-bus adapters on the server.
LVM Steps
1. Disks need to be properly initialized before being added into volume groups by the pvcreate command. Do the following step for all the disks (LUNs) you want to configure for your 9i RAC volume group(s):
$ pvcreate �f /dev/rdsk/cxtydz ( where x=instance, y=target, and z=unit)
2. Create the volume group directory with the character special file called group:
$ mkdir /dev/vg_rac
$ mknod /dev/vg_rac/group c 64 0x060000
Note: The minor numbers for the group file should be unique among all the volume groups on the system.
3. Create PV-LINKs and extend the volume group:
$ vgcreate /dev/vg_rac /dev/dsk/c0t1d0 /dev/dsk/c1t0d0
$ vgextend /dev/vg_rac /dev/dsk/c1t0d1 /dev/dsk/c0t1d1
Continue with vgextend until you have included all the needed disks for the volume group(s).
4. Create logical volumes for the 9i RAC database with the command
$ lvcreate �i 10 �I 1024 �L 100 �n Name /dev/vg_rac
-i: number of disks to stripe across
-I: stripe size in kilobytes
-L: size of logical volume in MB
5. Logical Volume Configuration