NO/AMRF, was: how to lose a sysplex in 30 seconds

2005-12-01 Thread Barbara Nitz
Bill, >At the risk of starting a tangential thread, why would a site choose to >run with NOAMRF? did you by any chance also talk to console level2 and ask them how many problems were introduced by running AMRF=Y, especially before console restructure? One very bad problem (that as far as I know

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Ed Gould
On Dec 1, 2005, at 1:01 PM, Alan C. Field wrote: We've run with AMRF(N) for years. Health Checker says this is the preferred state. One must presume the folks in CONSOLES know what they are doing How true how true... BTW the question needs to be asked. When was the last time IBM had an o

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Bill Neiman
On Thu, 1 Dec 2005 15:37:25 -0600, Bill Neiman <[EMAIL PROTECTED]> wrote: >That check says that if you are running AMRF(Y), and >if you are retaining eventual action messages, you should update your MPF >specifications to retain only immediate and critical action messages. Sorry, that should have

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Bill Neiman
On Thu, 1 Dec 2005 13:01:58 -0600, Alan C. Field <[EMAIL PROTECTED]> wrote: >We've run with AMRF(N) for years. > >Health Checker says this is the preferred state. > >One must presume the folks in CONSOLES know what they are doing Alan, I've been discussing this with the Consoles folks, and

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Alan C. Field
We've run with AMRF(N) for years. Health Checker says this is the preferred state. One must presume the folks in CONSOLES know what they are doing >Don't you need to have AMRF=Y set in order to capture (and display) >outstanding action messages? (Not all sites run with AMRF=Y... my current

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Bill Neiman
On Thu, 1 Dec 2005 07:36:55 -0600, Tom Schmidt <[EMAIL PROTECTED]> wrote: >Don't you need to have AMRF=Y set in order to capture (and display) >outstanding action messages? (Not all sites run with AMRF=Y... my current >site, sadly, is one that uses NOAMRF.) Tom, Yes, you do. At the risk o

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Tom Schmidt
On Thu, 1 Dec 2005 07:13:56 -0600, Bill Neiman wrote: >>On 11/28/05, Bill Neiman wrote: >>> > >>>Hence the IXC256A message. I'm not sure why >>> a D R,R failed to display the outstanding message, since IXC256A is >>>issued >>> with descriptor code 11. > >Just to tie up this loose end - D R,R doesn

Re: how to lose a sysplex in 30 seconds

2005-12-01 Thread Bill Neiman
>On 11/28/05, Bill Neiman <[EMAIL PROTECTED]> wrote: >> >>Hence the IXC256A message. I'm not sure why >> a D R,R failed to display the outstanding message, since IXC256A is >>issued >> with descriptor code 11. Just to tie up this loose end - D R,R doesn't display an outstanding IXC256A because I

Re: how to lose a sysplex in 30 seconds

2005-11-29 Thread ibm-main
From: "Edward E. Jaffe" > > Some shops (like ours) run "dark". No operator in sight. Some run staffed 24x7. Same comment applies generally. Shane ... -- For IBM-MAIN subscribe / signoff / archive access instructions, send email

Re: how to lose a sysplex in 30 seconds

2005-11-29 Thread Edward E. Jaffe
Gil Peleg wrote: The reason IXC256A eventually escaped our eyes was of course a combination of several configuration errors which we have taken actions to correct. But, I wonder if this is something that can/should be inserted into the z/OS and Sysplex Health Checker. The Health Checker already

Re: how to lose a sysplex in 30 seconds

2005-11-29 Thread Gil Peleg
Peter, The reason IXC256A eventually escaped our eyes was of course a combination of several configuration errors which we have taken actions to correct. But, I wonder if this is something that can/should be inserted into the z/OS and Sysplex Health Checker. The Health Checker already alerts about

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Peter Relson
If you are going to run your consoles with DEL=R which explicitly tells the system not to keep things where you can see them (i.e., it tells the system to let them roll off the screen), even when the message was issued to do exactly that, then you had better provide some means of noticing those mes

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Bill Neiman
On Mon, 28 Nov 2005 14:51:48 +0100, TISLER Zaromil <[EMAIL PROTECTED] AUSTRIA.COM> wrote: >Bill, > ><- snip -> >The existence of signalling >connectivity created a race condition, in which MVSA and MVSB were >competing to detect and report the loss of access to the CDS at their >respective

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Gil Peleg
Bill, Thanks a lot for the explanation. Gil. On 11/28/05, Bill Neiman <[EMAIL PROTECTED]> wrote: > > Gil, > > When any system detects a permanent I/O error during an attempt to > access a couple data set, it initiates removal of that CDS from service. > The removal protocol involves notifyin

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Chase, John
> -Original Message- > From: IBM Mainframe Discussion List On Behalf Of Gil Peleg > > [ snip ] > > But in the case I encoutered, the 1 system was telling all > the rest that they should switch to the alternate, when they > tried to switch they entered the disabled wait. > > It seems th

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Bill Neiman
On Mon, 28 Nov 2005 14:59:00 +0200, Gil Peleg <[EMAIL PROTECTED]> wrote: >Maybe they should introduce the same kind of processing done by APAR OA07640 >in case of an operator initiated SETXCF COUPLE,PSWITCH ?? Gil, APAR OA07640 does not change the processing for removal of a couple data set

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread TISLER Zaromil
Bill, <- snip -> The existence of signalling connectivity created a race condition, in which MVSA and MVSB were competing to detect and report the loss of access to the CDS at their respective sites. MVSB won the race, detecting and signalling the loss of the primary CDS before MVSA detec

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Bill Neiman
On Sun, 27 Nov 2005 14:30:16 +0200, Gil Peleg <[EMAIL PROTECTED]> wrote: > We have 2 LPARs in a sysplex, running on 2 different machines in 2 >different sites. >What happened was we lost connectivity between our 2 sites for a few >seconds. >As a result, MVSB (running in site B) lost its connectivi

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Gil Peleg
Maybe they should introduce the same kind of processing done by APAR OA07640 in case of an operator initiated SETXCF COUPLE,PSWITCH ?? Gil. On 11/28/05, Barbara Nitz <[EMAIL PROTECTED]> wrote: > > I understand what you are saying, but I don't think that is how the > architecture works. > --

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Gil Peleg
You are absolutely right. I did not get into the details of how the loss of connectivity actually happened. All the links in this site go through a box that multiplexes them over a cable provided by a local cables company to the other site. In fact, we do have 2 cables between the sites, and under

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread TISLER Zaromil
<- snip -> We have 2 LPARs in a sysplex, running on 2 different machines in 2 different sites. What happened was we lost connectivity between our 2 sites for a few seconds. As a result, MVSB (running in site B) lost its connectivity to the primary SYSPELX couple data set residing on dasd in

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Barbara Nitz
>But in the case I encoutered, the 1 system was telling all the rest that >they should switch to the alternate, when they tried to switch they >entered the disabled wait. SFM deals with loosing *signalling* connectivity, not with loosing I/O connectiviy to the sysplex CDS. If a system cannot acces

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread R.S.
Gil Peleg wrote: Hi all, We had a small incident here last week and I wanted to hear your take about it... [...] IMHO the incident was not so small, you lost all the connectivity. Needless to say the connectivity should be fully redundant. The cables can be broken. Parallel sysplex is hard

Re: how to lose a sysplex in 30 seconds

2005-11-28 Thread Gil Peleg
Barbara, thanks for your help. I've been reading on SFM. In MVS Setting Up a Sysplex, under 7.2.3 Handling Signaling Connectivity Failures they have an example of a 3 systems sysplex.. and they mention that: "SFM determines that the sysplex can be reconfigured as SYSA and SYSC or as SYSB and SYSC.

Re: how to lose a sysplex in 30 seconds

2005-11-27 Thread Barbara Nitz
>As a result, MVSB (running in site B) lost its connectivity to the >primary SYSPELX couple data set residing on dasd in site A, and issued the >following message: IXC253I > >The above message was then issued by MVSA as well. Sadly enough, our >alternate SYSPLEX couple data set resides on dasd in s

Re: how to lose a sysplex in 30 seconds

2005-11-27 Thread Paul Hanrahan
-MAIN@BAMA.UA.EDU Subject: how to lose a sysplex in 30 seconds Hi all, We had a small incident here last week and I wanted to hear your take about it... We have 2 LPARs in a sysplex, running on 2 different machines in 2 different sites. What happened was we lost connectivity between our 2 sites for

how to lose a sysplex in 30 seconds

2005-11-27 Thread Gil Peleg
Hi all, We had a small incident here last week and I wanted to hear your take about it... We have 2 LPARs in a sysplex, running on 2 different machines in 2 different sites. What happened was we lost connectivity between our 2 sites for a few seconds. As a result, MVSB (running in site B) lost it