My coworker here modified one of the key files for NetBackup 6 once but
didn't bother to bounce NetBackup to see what effect his change had.

 

Later he goes on vacation and I bounce NetBackup while dealing with a
fairly minor problem.  It didn't come back up and after spending 24
hours on the phone with NetBackup support I figured out what the problem
was.  NetBackup never suggested looking at this file  - I just happened
to notice the file date was newer than the previous NetBackup bounce
while I was on one of my interminable holds with them.    What was worse
is they couldn't tell me what the fields in the file he changed were for
even after I discovered it.   Luckily I'd taught him at a previous job
to always save originals when editing files for quick back out so I was
able to revert.  

 

Now any time he says "I'm going to..." and it is heading into a weekend
I'm on call I tell him "No, you're not!"   He's a pretty smart guy but I
figure he should have the ...um...joy of dealing with the results of his
experiments.  

 

________________________________

From: veritas-bu-boun...@mailman.eng.auburn.edu
[mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of WEAVER,
Simon (external)
Sent: Thursday, January 21, 2010 6:47 AM
To: Jeff Cleverley; Justin Piszcz
Cc: VERITAS-BU@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Bpjobd and other failures.

 

Jeff

Good idea about not touching anything again...

 

One colleague a few years back, done a Firmware upgrade to a Tape
Library on a Friday and went on a few weeks vacation.

 

On the Monday, a new person, totally unaware of the backups found every
single one was failing. took 2 weeks to resolve.

 

Never make big changes on the last day of the week, or before you go on
vacation springs to mind :-)

Enjoy the break !

 

Simon

 

________________________________

From: veritas-bu-boun...@mailman.eng.auburn.edu
[mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of Jeff
Cleverley
Sent: Wednesday, January 20, 2010 6:03 PM
To: Justin Piszcz
Cc: VERITAS-BU@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Bpjobd and other failures.

Justin,

Thanks for the reply.  For whatever reason things seem to have magically
started working again.  All I did was shutdown Veritas (again), turned
up the verbosity in bp.conf, and restarted it.  When it first started I
still didn't have bpdbm, bpjobd, etc, running.  The vnetd log had a lot
of errors.  When I ran bpdbjobs from the command line, nothing came
back.

While looking through the bpdbm log I found no errors but a lot of
entries like it was doing backups.  About 10 minutes later I ran
bpdbjobs again and everything showed up and some jobs were running.  I
think this are restarts of some failed jobs so we'll see how they do.
So far 4 of them have completed successfully.

Since I leave the country on vacation tomorrow morning I don't plan on
touching anything else on it today :-)

Thanks again for the help.

Jeff

On Wed, Jan 20, 2010 at 2:11 AM, Justin Piszcz <jpis...@lucidpixels.com>
wrote:

Hi,

Taking a shot in the dark here, for the tcp issues, try adding:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

To your /etc/sysctl.conf, reboot.

For vnetd, check your /etc/xinetd.d/vnetd*
Also check the logs that xinetd is not throttling connections if too
many servers are trying to backup too fast that can happen.

Justin. 



On Tue, 19 Jan 2010, Jeff Cleverley wrote:

Greetings,

While continuing to work on this it seems there may be issues with
vnetd.
The netstat -a |grep vnet shows this:

tcp        0      0 *:vnetd                     *:*
LISTEN
tcp        0      0 sgpbkp04.sgp.avagotec:35781
agt604.sgp.avagotech.:vnetd
ESTABLISHED
tcp        0      0 sgpbkp04.sgp.avagotec:35720
sgpbkp04.sgp.avagotec:vnetd
ESTABLISHED
tcp        0      0 sgpbkp04.sgp.avagotec:vnetd
sgpbkp04.sgp.avagotec:35720
ESTABLISHED
tcp        0      0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35846
TIME_WAIT
tcp        0      0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35853
TIME_WAIT
tcp        0      0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35839
TIME_WAIT
unix  2      [ ACC ]     STREAM     LISTENING     146403
/usr/openv/var/vnetd/vmd.uds
unix  2      [ ACC ]     STREAM     LISTENING     145874
/usr/openv/var/vnetd/bpcompatd.uds
unix  2      [ ACC ]     STREAM     LISTENING     146786
/usr/openv/var/vnetd/tldcd.uds
unix  3      [ ]         STREAM     CONNECTED     152574
/usr/openv/var/vnetd/bpcompatd.uds

The time_wait entries seem to stick around a lot.  I've restarted xinetd
on
the system and we have rebooted but things are still wedged.

Thanks,

Jeff

On Tue, Jan 19, 2010 at 6:00 PM, Jeff Cleverley <
jeff.clever...@avagotech.com> wrote:

Greetings,

Our environment is NB6.5.1 on a RHEL4 server.  It has a hpux SAN media
server also.  All other clients are backed up over the network.  Most
are
RHEL4x.

The tape library in our Singapore office failed over the weekend and
caused
a lot of things to fail and continue to be wedged up.  Some jobs seemed
to
have run but some failed with errors 13, 63, and 233.  This varied
across
policies.  I decided to try and restart all processes and get things
cleaned
up.  This hasn't worked well.

When I started everything using service netbackup start or
/etc/init.d/netbackup start, everything looks OK.  When I look at things
like bpps -a I notice that the bpjobd isn't running anymore.  When I try
to
start it manually it fails saying File size limit exceeded.  The
bpdbjobs
returns no output.  I haven't been able to figure out which file it is
complaining about.

I'm sure I have a lot of things that need to be cleaned up.  There are a
lot of files in the restart and trylogs.  I was thinking it was safe to
move
those out of the way but wanted to make sure.

Any help on tracking the bpjobd error along with advice on cleaning up
all
the restart and trylogs would be appreciated.  Naturally I'm leaving on
vacation Thursday so I need to help clean this up before I go.  I won't
be
doing any replies to this after Wednesday night because of that.

Thanks,

Jeff

--
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611





-- 
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611




-- 
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611

This email (including any attachments) may contain confidential
and/or privileged information or information otherwise protected
from disclosure. If you are not the intended recipient, please
notify the sender immediately, do not copy this message or any
attachments and do not use it for any purpose or disclose its
content to any person, but delete this message and any attachments
from your system. Astrium disclaims any and all liability if this
email transmission was virus corrupted, altered or falsified.
-o-
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office:
Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England
 
Proud partner. Susan G. Komen for the Cure.
 
Please consider our environment before printing this e-mail or attachments.
----------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.
----------------------------------
_______________________________________________
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Reply via email to