Arno Lehmann wrote:
> Or, alternatively, using tcpdump to find if the sequence numbers get out
> of sync somewhere, which would cause a RST on both ends.
Okay, I got a tcpdump and logfile of -d1000 on the fd. I'm a little rusty
debugging TCP issues by hand, but I couldn't find anything that loo
Arno Lehmann wrote:
> I'm not a good debugger user, but strace might be the next thing to
> try... like capturing all socket operations, or something. Perhaps you
> get to know if the error is cause by the OS on one end.
Knowing how verbose strace can be, I'm a little hesitant to jump right to t
Hello,
On 6/6/2007 3:38 PM, Frank Sweetser wrote:
> Well, I had a failure last night while I was monitoring memory usage. I had a
> script snagging the output of ps -o rss for both bacula-sd and bacula-dir
> every 60 seconds. Based on that, memory usage for both jumped only by a few
> megs when
Well, I had a failure last night while I was monitoring memory usage. I had a
script snagging the output of ps -o rss for both bacula-sd and bacula-dir
every 60 seconds. Based on that, memory usage for both jumped only by a few
megs when the jobs started. The dir was around 20M, and the sd aro
Arno Lehmann wrote:
> If you need a minimal Nagios plugin - I wrote some shell script for that
> purpose once :-)
Oddly enough, nothing actually crashes - a handfull of jobs fail, but all
subsequent ones go through just fne.
>>> A work around would be to not start all your jobs at once but run t
Hi,
On 6/4/2007 6:17 PM, Frank Sweetser wrote:
> Arno Lehmann wrote:
>> Well, this one looks difficult.
>
> At least it's not just me, then =)
>
>> I suggest to monitor the memory usage of your server. I experienced
>> problems with (usually) the DIR or (seldomly) the SD using up all
>> availa
Arno Lehmann wrote:
> Well, this one looks difficult.
At least it's not just me, then =)
> I suggest to monitor the memory usage of your server. I experienced
> problems with (usually) the DIR or (seldomly) the SD using up all
> available memory. Wich probably might affect the kernel so that it
Hi,
On 6/2/2007 7:43 AM, Frank Sweetser wrote:
> A couple of weeks ago, a problem started cropping up. Jobs started failing
> with what look like network errors:
>
> 02-Jun 01:10 lorien-sd: gkar-daily.2007-06-02_01.05.02 Fatal error:
> append.c:259 Network error on data channel. ERR=Input/output
A couple of weeks ago, a problem started cropping up. Jobs started failing
with what look like network errors:
02-Jun 01:10 lorien-sd: gkar-daily.2007-06-02_01.05.02 Fatal error:
append.c:259 Network error on data channel. ERR=Input/output error
02-Jun 01:10 lorien-sd: Job write elapsed time = 0