Hi Guys Thanks for all the advise last night. We found the problem - I still don't understand exactly how it's related, but IPMP was config'ed incorrectly. Once my colleagues fixed it, had and gab were 'happy'! This what ifconfig looked like yesterday: # ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 bge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2 inet 10.113.124.12 netmask ffffff00 broadcast 10.113.124.255 groupname drp-db ether 0:14:4f:a3:81:e4 bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2 inet 10.113.124.102 netmask ffffff00 broadcast 10.113.124.255 bge3: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3 inet 10.113.124.112 netmask ffffff00 broadcast 10.113.124.255 groupname drp-db ether 0:14:4f:a3:81:e7 After today's changes: # ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 bge3: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3 inet 10.113.124.112 netmask ffffff00 broadcast 10.113.124.255 groupname drp-db ether 0:14:4f:a3:81:e7 bge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4 inet 10.113.124.102 netmask ffffff00 broadcast 10.113.124.255 groupname drp-db ether 0:14:4f:a3:81:e4 bge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4 inet 10.113.124.12 netmask ffffff00 broadcast 10.113.124.255
Marianne ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marianne Van Den Berg Sent: 13 November 2007 22:42 To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error Brand new systems: Sun Fire V245 Servers, 2 Ultra Sparc IIIi 1.5Ghz CPU, 2Gb DDR-1 SDRAM Only O/S and SF/HA loaded. I recently presented vcs 5.0 training in Israel on little single-cpu blades with 1Gb memory, clustering 2 Oracle instances. It was dog slow but no errors.... Only difference was using 5.0 - no patches. Maybe time to log a support call - I simply don't have the energy.... My experience with Symantec Support is getting progressively more disappointing/frustrating. Was hoping somebody on the list has seen the same problem and could point me in the right direction. Kind regards Marianne ________________________________ From: Jim Senicka [mailto:[EMAIL PROTECTED] Sent: 13 November 2007 20:36 To: Marianne Van Den Berg; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error HAD is not talking to GAB. Excessive system utilization, or a blocked /var file system or some such issue. ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marianne Van Den Berg Sent: Tuesday, November 13, 2007 1:17 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error Hi all Brand new installation - 2-node cluster, Solaris 9 with latest O/S patches, SF/HA 5.0 with MP1. IPMultiNICB config'ed as parallel sg (using mpathd) and ClusterService group. Getting these errors about 3 minutes after hastart. Any ideas?? /var/adm/messages: Nov 13 15:59:11 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 7 sec Nov 13 15:59:12 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 8 sec Nov 13 15:59:13 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 9 sec Nov 13 15:59:14 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 10 sec Nov 13 15:59:15 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds) Nov 13 15:59:15 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 11 sec Nov 13 15:59:16 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 12 sec Nov 13 15:59:17 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 13 sec Nov 13 15:59:18 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 14 sec Nov 13 15:59:19 drp-db-1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 140: heartbeat failed, killing process Nov 13 15:59:19 drp-db-1 gab: [ID 975177 kern.notice] GAB INFO V-15-1-20059 Port h heartbeat interval 15000 msec. Statistics: Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 3000 msec: 3869 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 3000 ~ 6000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 9000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 9000 ~ 12000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 15000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 259915 kern.notice] GAB INFO V-15-1-20094 number of processes: 158 Nov 13 15:59:19 drp-db-1 gab: [ID 631272 kern.notice] GAB INFO V-15-1-20095 load average in 1 min: 0. 6 Nov 13 15:59:19 drp-db-1 gab: [ID 587815 kern.notice] GAB INFO V-15-1-20096 load average in 5 min: 0. 8 Nov 13 15:59:19 drp-db-1 gab: [ID 980060 kern.notice] GAB INFO V-15-1-20097 load average in 15 min: 0.10 Nov 13 15:59:19 drp-db-1 gab: [ID 559196 kern.notice] GAB INFO V-15-1-20098 pagein rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 582491 kern.notice] GAB INFO V-15-1-20099 pageout rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 940236 kern.notice] GAB INFO V-15-1-20041 Port h: client process failure: killing process Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53034 HAD Signal SIGABRT received Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53038 Beginning execution of the diagnostics script Nov 13 15:59:21 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53039 Completed execution of the diagnostics script Nov 13 15:59:22 drp-db-1 gab: [ID 397130 kern.notice] GAB INFO V-15-1-20032 Port h closed Nov 13 15:59:22 drp-db-1 syslog[29181]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11103 VCS exited. It will restart had restarts, but the same thing happens again after a couple of minutes. Regards Marianne
_______________________________________________ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha