上周hacmp出现了两次问题,都让我很郁闷!
操作系统5.3ml02,hacmp 5.2,开始没有打补丁。在上周4下午突然有次报心跳的错误,备机直接被关闭了,errpt里面连shutdown的信息都没有看到:
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Thu Jul 28 17:02:51 BEIST 2005
Sequence Number: 17125
Machine Id: 00C41FCE4C00
Node Id: hepr3prd
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39,4143
ERROR ID
6zV5DL.vw7u0/8D61/6.e.1...................
REFERENCE CODE
Adapter interface name
tty0
Adapter offset
2
Adapter IP address
255.255.0.0
第二次更要命,凌晨5点左右,clustermgrES报错,老兄自己就没有顶住,而且资源还来不及释放备份机就开始接管了,最后当然是接管不起来:
LABEL: OPMSG
IDENTIFIER: AA8AB241
Date/Time: Fri Jul 29 04:45:14 BEIST 2005
Sequence Number: 17131
Machine Id: 00C41FCE4C00
Node Id: hepr3prd
Class: O
Type: TEMP
Resource Name: OPERATOR
Description
OPERATOR NOTIFICATION
User Causes
ERRLOGGER COMMAND
Recommended Actions
REVIEW DETAILED DATA
Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
因为没有释放资源就开始接管,出现了ip地址重复的问题:
# errpt -aj FE2DEE00
---------------------------------------------------------------------------
LABEL: AIXIF_ARP_DUP_ADDR
IDENTIFIER: FE2DEE00
Date/Time: Fri Jul 29 04:45:14 BEIST 2005
Sequence Number: 17132
Machine Id: 00C41FCE4C00
Node Id: hepr3prd
Class: S
Type: PERM
Resource Name: SYSXAIXIF
Description
DUPLICATE IP ADDRESS DETECTED IN THE NET
Failure Causes
ARP RESPONSE RECEIVED FOR MY IP ADDRESS
Recommended Actions
CONTACT NETWORK ADMINISTRATOR
Detail Data
DUPLICATE IP ADDRESS
0A01 020C
MAC ADDRESS
0009 6BDD FCCA
在hacmp.out里面信息包括了无法获取硬盘资源的信息。
现在是打了补丁,不知道会怎样。
还有一个问题,放在hacmp的application server里面stop脚本中的varyoffvg语句备机上一直没有正常执行完,生产机上能正常执行。