Dave Markham wrote: Sorry correction :-
hours=12 Prev_Job=`bperror -backstat -client <oracle mgmt client> -hoursago $hours | awk '$14 == "<policy>" { print "Client ["$12"], STATUS ["$19"]" }'` if [ "$Prev_Job" ];then echo "ERROR: A previous job has ran in the past [$hours] hours" >> $log echo "$Prev_Job" >> $log exit 1 fi > ....and if its use for anyone else here is what i shall implement :- > > Prev_Job=`bperror -backstat -client <oracle mgmt client> -hoursago 12 | > awk '$14 == "<policy>" { print "Client ["$12"], STATUS ["$19"]" }'` > if [ "$Prev_Job" ];then > echo "ERROR: A previous job has ran in the past [$hours] hours" >> $log > echo "$Prev_Job" >> $log > exit 1 > fi > > > ken_zuf...@goodyear.com wrote: > >> Wow, glad I don't have your job...that's pretty convoluted :P >> >> But may have an answer, building off the lock file idea...but much >> simpler. Just put the logic in the bpstart to check and see if the >> policy you're executing has run in the past X hours and failed...if it >> has, exit gracefully, if it hasn't, continue the backup. >> >> Quick and dirty logic: >> >> bperror -backstat -hoursago [hours] -l | awk '{print $19,$14}' | grep >> -v "^0" | grep [policy_name] >> >> In the above, $19 = backup status code, $14 = policy name. Strip out >> any successful backups, grep for the policy name...if it's not null, >> you've had a failure in the past X hours. >> >> Of course, there are different ways to parse the bperror output, but >> the above would work. In fact, you shouldn't even have to grep out >> successes because the process shouldn't be trying to submit the policy >> if it's run successfully. >> >> Ken Zufall >> Technical Analyst >> D660C >> The Goodyear Tire & Rubber Company >> GTN 446.0592 or 330.796.0592 >> >> >> >> *Dave Markham <dave.mark...@fjserv.net>* >> >> 04/07/2009 06:16 AM >> Please respond to >> dave.mark...@fjserv.net >> >> >> >> To >> ken_zuf...@goodyear.com >> cc >> "veritas-bu@mailman.eng.auburn.edu" <veritas-bu@mailman.eng.auburn.edu> >> Subject >> Re: [Veritas-bu] Number of retries query >> >> >> >> >> >> >> >> >> >> Thanks guys there are some useful options there. >> >> To give more info we run the RMAN job as follows :- >> >> -We have an oracle admin station which holds various oracle dba scripts. >> -We have a policy which controls the scheduling and kicks of a client >> backup of this oracle management station and backs up a single file to a >> disk storage unit on the master. (simple directory). We backup one file >> to stop any status 71 >> -The reason the policy and schedule is there is to run a bpstart script >> on the management station. >> -This bpstart script checks no oracle tape dba script is already running >> (if it is it exits non zero and obviously gives status 73 in netbackup) >> -Once the checks are passed it launches an oracle dba script (not >> maintained by me). >> -This oracle script talks to 3 oracle RAC servers and works out which >> one is running the particular db instance. >> -These oracle RAC servers are all Netbackup media servers and they then >> initiate the oracle backup through a Netbackup oracle agent on the >> relevant media server. This backs up using the application schedules on >> the master server for the associated policy with each media server. >> (sorry that sounds confusing). >> -If the oracle script fails and exits with non zero then in turn our >> bpstart script fails with status 73 and we can alert the dbas >> >> We want to launch via netbackup this way so we can trap the exit status >> and report to the dbas there has been a problem, plus for it to appear >> on a daily report. >> >> The case we have experienced is if a backup fails which could be due to >> no tapes or various oracle failures, the dba's don't want an automatic >> one running again as it starts doing things with flash recovery areas >> and starts running into the normal working day. >> >> Indeed perhaps some logic in the bpstart script to create a lockfile is >> useful, but the lock file would need to be removed upon completion or >> failure and this would then not give us any benefit when try 2 happens. >> >> If a lock file was used we could do some date matching and perhaps only >> run a job if the lockfile was older than x hours ( a lot of date parsing >> though which could be difficult ) to touch it again and run the backup. >> I'll have to explorer this method. >> >> Cheers >> >> >> >> >> >> ken_zuf...@goodyear.com wrote: >> >>> Dave, >>> >>> This isn't an ideal fix, but it will work--schedule the backups from >>> the client. Basically, just put entries in cron (root or oracle will >>> work) with the commands (or script wrapper around the command) to >>> launch the backup instead of using the NBU scheduler (will have to >>> remove current full/incremental schedules and replace with a user >>> directed that has the appropriate windows). Reason this will work is >>> because the automatic retries only affects backups launched from the >>> master...if it's submitted by the client, it will not retry on failure. >>> >>> Only real issues off the top of my head are: >>> >>> 1) If client is down or doesn't have network connectivity, you won't >>> see failure to run backup in NBU because the backup will never be >>> submitted. >>> >>> 2) You lose visibility to backup schedules within NBU. >>> >>> Ken Zufall >>> Technical Analyst >>> D660C >>> The Goodyear Tire & Rubber Company >>> GTN 446.0592 or 330.796.0592 >>> >>> >>> >>> *Len Boyle <len.bo...@sas.com>* >>> Sent by: veritas-bu-boun...@mailman.eng.auburn.edu >>> >>> 04/06/2009 09:30 AM >>> >>> >>> To >>> "dave.mark...@fjserv.net" <dave.mark...@fjserv.net>, >>> "veritas-bu@mailman.eng.auburn.edu" <veritas-bu@mailman.eng.auburn.edu> >>> cc >>> >>> Subject >>> Re: [Veritas-bu] Number of retries query >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Good Morning Dave, >>> >>> I know of no way to change the number of job retries on a policy or >>> client or schedule object. >>> I can see where this would be a nice feature to have. >>> >>> There are many different reasons that a rman backup job can fail. >>> >>> >From a netbackup end of things one could have a 96 error no scratch >>> tapes, >>> A media fault, A network issue. Etc. >>> Or it could be a oracle issue. >>> >>> For something like a media issue that is cleared up on the netbackup >>> end of things I would think that the dba's would want the backup to be >>> retried. For an oracle issue I do not know enough. >>> >>> But either way I believe that you could add the control you require >>> into the script that netbackup runs on the client to run the rman >>> commands. Might not be easy. >>> >>> I am sure other that know oracle can give you a better answer then >>> this, and I look forward to learning. >>> As a simple case of go or nogo without any variance based on the prior >>> failure you could try. >>> In the beginning of the script you could set a state value of >>> "STARTED" into a file on client. At the end of the script the vaule >>> could be changed to "COMPLETE". >>> At the start of the script if the value is not "COMPLETE" the script >>> could give an error return and exit. Someone would have to change the >>> statue value to "STARTED" to enable the script to run. This could be >>> done after clearing the problem. This can also be used to bypass the >>> running of the backup at the script level when the oracle dba's are >>> doing maintenance work on the oracle database. If you use and check >>> for some state value of "BYPASS" then the script could exit with a >>> normal return code and netbackup would not have a backup but would >>> think that everything is ok and not retry. >>> You could also use touch files instead on one state file. >>> >>> Let us know what you end of doing to solve this issue. >>> >>> len >>> >>> -----Original Message----- >>> From: veritas-bu-boun...@mailman.eng.auburn.edu >>> [mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of Dave >>> Markham >>> Sent: Monday, April 06, 2009 8:17 AM >>> To: veritas-bu@mailman.eng.auburn.edu >>> Subject: [Veritas-bu] Number of retries query >>> >>> Guys does anyone know if you can change the number of job retries in xx >>> time period on a per client basis? >>> >>> I currently have the global set at 2 tries per 12 hours which is fine >>> for our needs and good in the fact it will try a failed backup. >>> >>> However the DBA for an RMAN and oracle policy doesn't want this to >>> happen and re-run a backup if there is a failure so i need to try and >>> find a way of setting it to 1 try for just one client. >>> >>> Any ideas? >>> >>> Cheers >>> _______________________________________________ >>> Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu >>> >>> _______________________________________________ >>> Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu >>> >>> >> > > _______________________________________________ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > > > _______________________________________________ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu