Re: [Bacula-users] Mysteriously failing jobs

2007-06-08 Thread Frank Sweetser
Arno Lehmann wrote: > Or, alternatively, using tcpdump to find if the sequence numbers get out > of sync somewhere, which would cause a RST on both ends. Okay, I got a tcpdump and logfile of -d1000 on the fd. I'm a little rusty debugging TCP issues by hand, but I couldn't find anything that loo

Re: [Bacula-users] Mysteriously failing jobs

2007-06-06 Thread Frank Sweetser
Arno Lehmann wrote: > I'm not a good debugger user, but strace might be the next thing to > try... like capturing all socket operations, or something. Perhaps you > get to know if the error is cause by the OS on one end. Knowing how verbose strace can be, I'm a little hesitant to jump right to t

Re: [Bacula-users] Mysteriously failing jobs

2007-06-06 Thread Arno Lehmann
Hello, On 6/6/2007 3:38 PM, Frank Sweetser wrote: > Well, I had a failure last night while I was monitoring memory usage. I had a > script snagging the output of ps -o rss for both bacula-sd and bacula-dir > every 60 seconds. Based on that, memory usage for both jumped only by a few > megs when

Re: [Bacula-users] Mysteriously failing jobs

2007-06-06 Thread Frank Sweetser
Well, I had a failure last night while I was monitoring memory usage. I had a script snagging the output of ps -o rss for both bacula-sd and bacula-dir every 60 seconds. Based on that, memory usage for both jumped only by a few megs when the jobs started. The dir was around 20M, and the sd aro

Re: [Bacula-users] Mysteriously failing jobs

2007-06-04 Thread Frank Sweetser
Arno Lehmann wrote: > If you need a minimal Nagios plugin - I wrote some shell script for that > purpose once :-) Oddly enough, nothing actually crashes - a handfull of jobs fail, but all subsequent ones go through just fne. >>> A work around would be to not start all your jobs at once but run t

Re: [Bacula-users] Mysteriously failing jobs

2007-06-04 Thread Arno Lehmann
Hi, On 6/4/2007 6:17 PM, Frank Sweetser wrote: > Arno Lehmann wrote: >> Well, this one looks difficult. > > At least it's not just me, then =) > >> I suggest to monitor the memory usage of your server. I experienced >> problems with (usually) the DIR or (seldomly) the SD using up all >> availa

Re: [Bacula-users] Mysteriously failing jobs

2007-06-04 Thread Frank Sweetser
Arno Lehmann wrote: > Well, this one looks difficult. At least it's not just me, then =) > I suggest to monitor the memory usage of your server. I experienced > problems with (usually) the DIR or (seldomly) the SD using up all > available memory. Wich probably might affect the kernel so that it

Re: [Bacula-users] Mysteriously failing jobs

2007-06-04 Thread Arno Lehmann
Hi, On 6/2/2007 7:43 AM, Frank Sweetser wrote: > A couple of weeks ago, a problem started cropping up. Jobs started failing > with what look like network errors: > > 02-Jun 01:10 lorien-sd: gkar-daily.2007-06-02_01.05.02 Fatal error: > append.c:259 Network error on data channel. ERR=Input/output

[Bacula-users] Mysteriously failing jobs

2007-06-01 Thread Frank Sweetser
A couple of weeks ago, a problem started cropping up. Jobs started failing with what look like network errors: 02-Jun 01:10 lorien-sd: gkar-daily.2007-06-02_01.05.02 Fatal error: append.c:259 Network error on data channel. ERR=Input/output error 02-Jun 01:10 lorien-sd: Job write elapsed time = 0