Hi all,

 

I'm hoping someone's seen this before. I'm running 5.1MP6 w/ AIT3 - I've
got a ~126GB backup that kicks off weekly, but hangs within a few hours
every time - the error I get is always "media manager terminated by
parent process" but the logs don't seem to show anything odd. No other
backups hang like this. This is also the only job that runs on the
server itself.

 

bptm gives me:

 

03:28:45.470 [4999] <2> io_ioctl: command (1)MTFSF 1 from (bptm.c.8307)
on drive index 1

03:28:45.530 [4999] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/AK6503, from bptm.c.8310

03:28:45.530 [4999] <2> catch_signal: EXITING with status 82

 

so I check bpbrm:

 

02:05:33.882 [4992] <2> bpbrm spawn_child: /usr/openv/netbackup/bin/bptm
bptm -w -c foo.bar.com -den 17 -rt 6 -rn 0 -stunit Spectra2 -cl inbound
-bt 1183968330 -b foo.bar.com _1183968330 -st 0 -cj 1 -p inbound
-hostname foo.bar.com -ru root -rclnt foo.bar.com -rclnthostname
foo.bar.com -rl 5 -rp 8035200 -sl ftpif -ct 0 -maxfrag 1048576 -tir -v
-Z -mediasvr foo.bar.com -jobid 117926 -jobgrpid 117926 -masterversion
510000 -shm

02:05:33.884 [4992] <2> bpbrm write_continue_backup: wrote CONTINUE
BACKUP on COMM_SOCK <4>

02:05:33.884 [4992] <2> bpbrm main: wrote /na270/pub/inbound on
COMM_SOCK

02:05:33.884 [4992] <2> bpbrm main: wrote /na270/pub/ftp on COMM_SOCK

02:05:33.884 [4992] <2> bpbrm main: wrote CONTINUE on COMM_SOCK

02:05:33.885 [4992] <2> bpbrm main: ESTIMATE -1 -1 nbu0 foo.bar.com
_1183968330

02:09:44.763 [4992] <2> bpbrm mm_sig: received ready signal from media
manager

02:09:44.763 [4992] <2> bpbrm readline: retrying partial read from fgets
::

03:27:22.261 [4992] <2> bpbrm sighandler: signal 14 caught by bpbrm

03:27:22.272 [4992] <2> bpbrm sighandler: bpbrm timeout after 3600
seconds

03:27:22.287 [4992] <2> clear_held_signals: clearing signal mask stack,
mask_stack_depth = 0

03:27:22.287 [4992] <2> bpbrm kill_child_process: start

03:27:22.287 [4992] <2> bpbrm wait_for_child: start

03:28:48.546 [4992] <2> bpbrm wait_for_child: child exit_status = 82
signal_status = 0

03:28:48.557 [4992] <2> inform_client_of_status: INF - Server status =
41

 

but I can't seem to figure out why there was a timeout. I checked all
the related logs - bpbkar just shows file writing stopping at 2:42am -
like the process just hangs there, no errors though. Looking right now,
the bpbrm and bpbkar processes for this backup are still running, but
nothing is happening. The job shows as active and everything is queueing
up behind it.  I've also adjusted the CLIENT_READ_TIMEOUT in
/usr/openv/netbackup/bp.conf to no avail.

 

Can anyone point me in the right direction as to what I'm missing? I'm
guessing there's something I'm not seeing in one of the logs.

 

            -Aaron

 

Aaron Mills

Systems Administrator

Return Path, Inc.

http://www.returnpath.net

[EMAIL PROTECTED]

 

 

_______________________________________________
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Reply via email to