On Tue, Sep 29, 2009 at 3:10 PM, Dustin J. Mitchell wrote:
> Thanks! I'll fix up the compile problem -- not sure how it worked for me :)
Duh, it worked for me because I'm not running OpenBSD..
Anyway, fixed in r2149. Thanks to everyone on this 100+-message thread!
Dustin
--
Open Source Stor
On Tue, Sep 29, 2009 at 12:55 PM, Michael Burk wrote:
> I added the following lines to util.c as a temporary fix:
> #define DATA_FD_COUNT 3 /* number of general-use pipes */
> #define DATA_FD_OFFSET 50
>
> I ran three test backups, and each succeeded:
> - forced full, without comp
I added the following lines to util.c as a temporary fix:
#define DATA_FD_COUNT 3 /* number of general-use pipes */
#define DATA_FD_OFFSET 50
I ran three test backups, and each succeeded:
- forced full, without compression, no index
- forced full, with compression and index
- unfo
On Tue, Sep 29, 2009 at 09:31:17 -0400, Dustin J. Mitchell wrote:
> When a version of OpenBSD is released where this is no longer an issue
> (5.0?), we can conditionalize it on that version.
Well, I brought up this issue on the general principle that this sort of
workaround hack might have its own
On Mon, Sep 28, 2009 at 9:42 PM, Nathan Stratton Treadway
wrote:
> As you say, having this extra call shouldn't really hurt anything, but I
> wondering if it would make sense to tweak the #ifdef so that is possible
> to compile Amanda on OpenBSD without having to include the work-around
> even in
* that an amandad service inherits. This won't be necessary once the new
> + * threading library is availble (OpenBSD 5.0?), but won't hurt anyway. See
> the
> + * thread "Backup issues with OpenBSD 4.5 machines" from September 2009. */
> +#ifdef __OpenBSD__
On Mon, Sep 28, 2009 at 13:48:37 -0400, Dustin J. Mitchell wrote:
> The mysterious fcntl() calls, however, serve as a warning to uthreads
> that the index file exists. Uthreads sets the O_NONBLOCK flag when
> performing the fcntl(), but then clears it on execve(), so everything
> works as expected
Hi Dustin,
Great analysis; thanks for sharing the details of the problem.
On Mon, Sep 28, 2009 at 12:11 PM, Dustin J. Mitchell wrote:
> while putting together the patch
> (attached),
I patched the 0928 snapshot, but it didn't compile:
util.c: In function `openbsd_fd_inform':
util.c:1290: err
OK, I have a more in-depth summary of exactly what's going on here,
and why the fcntl() calls fix it. The good news: we've stumbled on a
pretty stable "fix" for this problem.
As background, the Amanda client operates something like this:
amandad is invoked by (x)inetd or some other mechanism
ama
hreading library is availble (OpenBSD 5.0?), but won't hurt anyway. See
the
+ * thread "Backup issues with OpenBSD 4.5 machines" from September 2009. */
+#ifdef __OpenBSD__
+void openbsd_fd_inform(void);
+#else
+#define openbsd_fd_inform()
+#endif
+
#endif /* UTIL_H */
diff --gi
On Wed, Sep 9, 2009 at 11:57 AM, Michael Burk wrote:
> Hi Dustin - one question before I post on the OpenBSD list. In reviewing a
> post Stan made a couple weeks ago to the OpenBSD list, someone asked if
> Amanda uses pthreads. I noticed that ldd reports that the binaries link to
> libpthread. Does
Hi Dustin - one question before I post on the OpenBSD list. In reviewing a
post Stan made a couple weeks ago to the OpenBSD list, someone asked if
Amanda uses pthreads. I noticed that ldd reports that the binaries link to
libpthread. Does Amanda use pthreads, either directly or through some other
l
On Tue, Sep 8, 2009 at 10:20 PM, Nathan Stratton
Treadway wrote:
> At that time the discussion was focused on common-src/stream.c, which
> hadn't changed significantly between those versions, but it would be
> interesting to know if there were any changes in the sendbackup code
> path after 2.5.0p1
On Tue, Sep 8, 2009 at 12:42 PM, Michael Burk wrote:
> Sorry - here they are.
OK, I don't see anything funny there. I think it's finally time to
take this to the OpenBSD list and see if they can find anything in
this wilderness. It might help to provide a pointer to the archive of
this (now *ver
On Tue, Sep 8, 2009 at 11:45 AM, Michael Burk wrote:
> Thanks Dustin. Attached are the gory details.
What about the amandad logfile?
Dustin
--
Open Source Storage Engineer
http://www.zmanda.com
OK, sorry for the delay. Attached is a patch, also at
http://github.com/djmitche/amanda/commit/435ed5a820819188578df4a8a730a6b084a9f29f.patch
which adds a whole bunch of debugging. I'd like to hear how this
works, and to see the sendbackup and amandad debug log files.
This should elucidate a
On Fri, Sep 4, 2009 at 6:21 PM, Nathan Stratton Treadway wrote:
>
>
> But it sounds like you are saying that the one-line patch, which touched
> only datafd, actually fixed both problems as well
>
> Exactly.
On Fri, Sep 04, 2009 at 19:40:28 -0400, Nathan Stratton Treadway wrote:
> On Fri, Sep 04, 2009 at 16:49:24 -0600, Michael Burk wrote:
> > Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
> > ran it again with the Dustin's 2-liner with the same results.
>
> Ah, interesti
On Fri, Sep 04, 2009 at 16:49:24 -0600, Michael Burk wrote:
> Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
> ran it again with the Dustin's 2-liner with the same results.
Ah, interesting.
I wonder if doing a one-line patch against (say) indexfd instead would
also p
Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
ran it again with the Dustin's 2-liner with the same results.
BTW, I ran these latest tests (all 8 runs) on the 0904 snapshot. It builds
cleanly on OpenBSD now.
-- Michael
On Fri, Sep 4, 2009 at 4:47 PM, Nathan Stratton
On Fri, Sep 04, 2009 at 16:41:31 -0600, Michael Burk wrote:
> Nathan, I think the following truth table will answer all your questions!
> (view in monospace font)
>
> Patch gzipIndex Result
> no no no dump failed (end of tape)
> no no yes index tee cannot
Nathan, I think the following truth table will answer all your questions!
(view in monospace font)
Patch gzipIndex Result
no no no dump failed (end of tape)
no no yes index tee cannot write
no yes no gzip strange, dump failed
no yes
On Fri, Sep 04, 2009 at 17:13:19 -0400, Dustin J. Mitchell wrote:
> On Fri, Sep 4, 2009 at 4:57 PM, Michael Burk wrote:
> > Thanks again for your help. Here's the output of the test prog:
>
> Looks just like it does locally. If the test had managed to reproduce
> this failure, then I would have e
Hi Dustin,
Thanks again for your help. Here's the output of the test prog:
bu...@selenium$ ./test parent
pipe = r...@3 w...@4
parent closing
parent sleeping
child closing p[0]
child exec'ing
child duping
child closing B
child writing
child write done
parent reading
parent got 4 bytes
bu...@seleni
On Fri, Sep 4, 2009 at 4:57 PM, Michael Burk wrote:
> Thanks again for your help. Here's the output of the test prog:
Looks just like it does locally. If the test had managed to reproduce
this failure, then I would have expected to see
write: Resource temporarily unavailable
That means somethi
On Wed, Sep 02, 2009 at 11:11:39 -0600, Michael Burk wrote:
> This was a good idea; I tried it with one modification: I determined
> earlier that the failure happens without indexing also, so I added just the
> line:
> fcntl(datafd, F_GETFL, 0);
> and that fixed the problem as well. So I guess
Attached is a test program I just put together which does about what
Jean-Louis specified above (with the addition of some closed fd's).
This works fine on my mac, which is the closest approximation to
OpenBSD I have access to at the moment. How does it work on 4.5?
The file-descriptor gymnastics
On Wed, Sep 02, 2009 at 11:11:39 -0600, Michael Burk wrote:
> This was a good idea; I tried it with one modification: I determined
> earlier that the failure happens without indexing also, so I added just the
> line:
> fcntl(datafd, F_GETFL, 0);
> and that fixed the problem as well. So I guess
On Wed, Sep 2, 2009 at 1:07 PM, Michael Burk wrote:
> So I suspect my approach is not correct. Any other ideas how I might get
> some useful trace output?
Can you have amandad sleep for, say, 120 seconds just before it
launches sendbackup, and somenow notify you of the pid to which you
should atta
I modified sendbackup-dump.c to run ktrace, e.g.:
"/usr/bin/ktrace -id -t censw -f /tmp/sendbackup.trc /sbin/dump 0usf 1048576
- /dev/rsd0d"
Unfortunately, I don't get a backup, even with the patch applied. The trace
output shows write errors because of a broken pipe with or without the
patches, l
This was a good idea; I tried it with one modification: I determined
earlier that the failure happens without indexing also, so I added just the
line:
fcntl(datafd, F_GETFL, 0);
and that fixed the problem as well. So I guess this is truly the minimal
patch!
-- Michael
On Tue, Sep 1, 2009 at
On Tue, Sep 1, 2009 at 3:18 PM, Jean-Louis
Martineau wrote:
> I have nothing else to try.
> The order of system call is a follow:
If it's not too hard, it would be nice to have a ktrace or equivalent
of this, first to look at here, and second to take to the OpenBSD
list. I know that's tricky sinc
Michael Burk wrote:
So it seems reliable that those 3 lines fix the problem somehow.
Anything else you want to try before I ask for help on the OpenBSD list?
I have nothing else to try.
The order of system call is a follow:
In amandad process:
pipe(pipefd)
dup2(pipefd[1],b)
fork
in the chi
I applied the 3-line patch to the 0831 snapshot and ran a full backup on
both machines, with 4 file systems each. All 8 completed successfully with
no "strange" messages.
Next, I commented out the 3 new lines and tried the backup again on one of
the machines. This time all 4 file systems failed; e
Since the logs indicate an error with the "index tee", I thought I'd try
turning off the index generation. The backup still failed, but with a
different error:
1251826429.341455: sendbackup: pid 26726 ruid 150 euid 150 version
2.6.2alpha: start at Tue Sep 1 11:33:49 2009
1251826429.342070: sendba
On Tue, Sep 1, 2009 at 7:45 AM, Jean-Louis
Martineau wrote:
> We need to find a minimal patch that fix the problem.
> Cat you try the attached patch?
This is starting to look like a kernel bug -- is there an associated
OpenBSD bug or something that we could reference in comments in the
code to exp
I checked the errata for OpenBSD 4.5, but saw nothing that looked related.
I applied the patch to the 0831 snapshot and am building it now. After we
find the minimal patch, as Jean-Louis said, I'll post on the OpenBSD-misc
list to see if anyone has an explanation.
Thanks guys for working on this -
We need to find a minimal patch that fix the problem.
Cat you try the attached patch?
Jean-Louis
Michael Burk wrote:
I applied the patches to the 0827 version again (had to do the
sendbackup-dump.c patch by hand since the patch was for another
version). I ran a full backup again, and all 3 fil
amanda doesn't do fcntl on these file descriptor.
But my patch do: fcntl(fd, F_GETFL, 0) to check if it have O_NONBLOCK
set, the patch doesn't change it.
Can you try the attached patch, it do the same trick for the index file
descriptor (remove previous patch before applying).
Jean-Louis
Mich
Hi Jean-Louis,
I thought I had applied the patches on this machine also, but it turns out I
didn't (sorry about that). I applied the patches and ran a new dump. This
time all 4 file systems succeeded, though /usr got the same "strange"
message as on the other machine:
1251752282.483677: sendbacku
Do you have the patch I sent to stan on this system?
The patch check before and after the write if the pipe is in O_NONBLOCK
or not and give an error if it is.
I'm totally lost since it is in blocking mode and you get EAGAIN, which
is impossible
Jean-louis
Michael Burk wrote:
Here are
Here are the two sendbackup.*.debug files for the / fs.
First:
1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.
You get the error in the index pipe instead of the data path. The backup
is correct but your index is empty. That's why you get a STRANGE result
instead of a failure.
Can you post a sendbackup.*.debug for a dle that failed?
Jean-Louis
Michael Burk wrote:
Hello,
I applied the patches to the
Hello,
I applied the patches to the 20090827 snapshot. I tried it on two OpenBSD
4.5 sparc64 systems, forcing both to do full backups. One system seemed to
work on all 3 file systems, the other failed on all 4 file systems with the
same errors as before (exactly like what Stan reported). I'm using
stan wrote at 16:59 -0400 on Aug 24, 2009:
> The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
> that 2.5.0 does:
>
> tv.tv_usec = 0;
>
> and 2.5.2 does not. Could thim make a difference? Both do
>
> tv.tv_sec = timeout;
In 2.5.2, the memset sets the entire str
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
> stan wrote at 10:56 -0400 on Aug 21, 2009:
> > OK here is the latest on this saga :-)
> >
> > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
> > back this machine up successfully (using classic UDP based aut
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
>
> You can also try the attached, it check the pipe are opened in blocking
> mode.
>
OK, I applied that to 2.61 (after figuriing out it would not apply to 2.5.2
:-)).
Of 4 DLE's on the test OpenBSD machine 3failed with PARTI
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
>
> You can also try the attached, it check the pipe are opened in blocking
> mode.
>
I will try to test that today.
Thanks.
--
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
> This bug can't be fixed until we understand it.
Agreed.
>
> Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
> Or the pipe semantics changed and they don't default to blocking.
I already posted the cod
This bug can't be fixed until we understand it.
Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
Or the pipe semantics changed and they don't default to blocking.
You can also try the attached, it check the pipe are opened in blocking
mode.
Jean-Louis
stan wrote:
On F
I'm using bsdtcp auth.
I'll try 2.5.1 today or tomorrow, to see if I can narrow down the range of
releases in which OpenBSD support broke.
-- Michael
On Mon, Aug 24, 2009 at 5:17 AM, stan wrote:
> On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
> > Stan's not alone on this one. I
On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
> Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
> SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but
> didn't have time to look deeper. After seeing this discussion, I built
> amanda-2
Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but
didn't have time to look deeper. After seeing this discussion, I built
amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting exactly
th
On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote:
> stan wrote at 13:56 -0400 on Aug 21, 2009:
> > OK, I reproduced the failure with only a crossover cable between the test
> > client and the Amanda Master:
>
> Just because you're using a crossover cable doesn't rule out firewall
> or o
stan wrote at 13:56 -0400 on Aug 21, 2009:
> OK, I reproduced the failure with only a crossover cable between the test
> client and the Amanda Master:
Just because you're using a crossover cable doesn't rule out firewall
or other such socket level interference. I'm not saying that's your
proble
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
> stan wrote at 10:56 -0400 on Aug 21, 2009:
> > OK here is the latest on this saga :-)
> >
> > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
> > back this machine up successfully (using classic UDP based aut
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
> stan wrote at 10:56 -0400 on Aug 21, 2009:
> > OK here is the latest on this saga :-)
> >
> > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
> > back this machine up successfully (using classic UDP based aut
stan wrote at 10:56 -0400 on Aug 21, 2009:
> OK here is the latest on this saga :-)
>
> On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
> back this machine up successfully (using classic UDP based authentication)
>
> On another of them, I built 2.5.2p1. The first at
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> >Any thoughts as to why I have twice as many debug files as I expect?
> >
>
One thing I forgot to put in the previous message is that 10.209.129.22 is
the Amanda Master machine's address.
--
One of the main c
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> >Any thoughts as to why I have twice as many debug files as I expect?
> >
>
> When a dump fail, amanda try it a second time.
>
OK here is the latest on this saga :-)
On one of the OpenBSD 4.5 machines I have
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> > I know I have asked this before, but I can't recall getting a definative
> > answer. Is port usage different between 2.5.x and 2.6 clients, when
> > talkimng to a 2.6 server?
> >
>
> No, the same port are use
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> > I know I have asked this before, but I can't recall getting a definative
> > answer. Is port usage different between 2.5.x and 2.6 clients, when
> > talkimng to a 2.6 server?
> >
>
> No, the same port are use
On Tue, Aug 18, 2009 at 01:52:56PM -0400, Dewey Hylton wrote:
> On Mon, Aug 17, 2009 at 4:32 PM, stan wrote:
> > On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
> >> From the write man page:
> >>
> >> ? ? [EAGAIN] ? ? ? The file was marked for non-blocking I/O, and no data
> >
On Mon, Aug 17, 2009 at 4:32 PM, stan wrote:
> On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
>> From the write man page:
>>
>> [EAGAIN] The file was marked for non-blocking I/O, and no data
>> could
>> be written immediately.
>>
>>
>> But sen
stan wrote:
I know I have asked this before, but I can't recall getting a definative
answer. Is port usage different between 2.5.x and 2.6 clients, when
talkimng to a 2.6 server?
No, the same port are used if configured with the same port range.
Do yousee something in the firewall log?
J
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> > I know I have asked this before, but I can't recall getting a definative
> > answer. Is port usage different between 2.5.x and 2.6 clients, when
> > talkimng to a 2.6 server?
> >
>
> No, the same port are use
On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
> From the write man page:
>
> [EAGAIN] The file was marked for non-blocking I/O, and no data
> could
>be written immediately.
>
>
> But sendbackup or gzip write the index to a blocking pipe.
From the write man page:
[EAGAIN] The file was marked for non-blocking I/O, and no data could
be written immediately.
But sendbackup or gzip write the index to a blocking pipe.
Maybe it's the firewall that returns that error.
You can try to switch to the 'bsdtcp' a
On Mon, Aug 17, 2009 at 11:30:51AM -0400, Jean-Louis Martineau wrote:
> And what is the error?
>
> It's probably not a "gzip: ..."
>
One more data point that I need to clarify. Thes 2 OpenBSD machines I am
fighting with did have older versions of Amanda on them, before we
upgradedd. They had 2.5.
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
> stan wrote:
> >Any thoughts as to why I have twice as many debug files as I expect?
> >
>
> When a dump fail, amanda try it a second time.
>
Got it.
Now, hee are what I think are the pertinant lines from a dump this morni
stan wrote:
Any thoughts as to why I have twice as many debug files as I expect?
When a dump fail, amanda try it a second time.
On Tue, Aug 18, 2009 at 07:34:11AM -0400, Jean-Louis Martineau wrote:
> man amanda.conf
> compress [client|server] string
> Default: client fast.
>
> Set it to none if you don't want compression.
> ig. compress none
Thanks, I was just going to write an email noting that I had fou
man amanda.conf
compress [client|server] string
Default: client fast.
Set it to none if you don't want compression.
ig. compress none
Jean-Louis
stan wrote:
On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
Check system log
Post complete amandad.*.debug
On Mon, Aug 17, 2009 at 11:30:51AM -0400, Jean-Louis Martineau wrote:
> And what is the error?
>
> It's probably not a "gzip: ..."
Interesting. I set up a test doing just one of these machine (4 DLE's), and
it worked in non-compressed. I will leave a couple of the OpenBSD machines
set fro non-com
And what is the error?
It's probably not a "gzip: ..."
Jean-Louis
stan wrote:
On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
Check system log
Post complete amandad.*.debug and sendbackup.*.debug.
You can try to disable client compression.
It still fails with
On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
> Check system log
> Post complete amandad.*.debug and sendbackup.*.debug.
>
> You can try to disable client compression.
>
It still fails with client side compression truned off.
--
One of the main causes of the fall of the
Check system log
Post complete amandad.*.debug and sendbackup.*.debug.
You can try to disable client compression.
Jean-Louis
stan wrote:
On Fri, Aug 14, 2009 at 09:50:22AM -0400, Jean-Louis Martineau wrote:
gzip: stdout: Resource temporarily unavailable
I have no idea what's that error me
On Fri, Aug 14, 2009 at 09:50:22AM -0400, Jean-Louis Martineau wrote:
> gzip: stdout: Resource temporarily unavailable
>
> I have no idea what's that error mean.
>
> Can you try the latest 2.6.1p1 snapshot from
> http://www.zmanda.com/community-builds.php
>
> Jean-Louis
>
OK, I compiled and in
gzip: stdout: Resource temporarily unavailable
I have no idea what's that error mean.
Can you try the latest 2.6.1p1 snapshot from
http://www.zmanda.com/community-builds.php
Jean-Louis
stan wrote:
WE deployed 3 OpenBSD machines yesterday to replace older OpenBSD machines
that had been backi
WE deployed 3 OpenBSD machines yesterday to replace older OpenBSD machines
that had been backing up happily. I honestly cannot remember which version
of Amanda was on these machines. The new ones have 2.6.1. The Amanda master
machine is 2.6.1, and has been happily backing up 55 machines every night
80 matches
Mail list logo