Re: Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-31 Thread John S. Dyson
Matthew Dillon said: > > Alan and I are working on it. We are testing a fix for pipe_read() now > and I'm working on one for pipe_write(). The fixes basically involve > holding the pipe's lock throughout all calculations and I/O ops except > when the code needs to explicitly tsl

Re: Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-31 Thread Matthew Dillon
:While I really doubt that this is related, I discovered today that I'm able :to repeatably lock up my -current machine with: : :find /home -print | afio -T 3k -G 6 -Z -z -v -o - | tee /scratch/backup.afio >/dev/rsa0 : :It runs for about 5 minutes, then hangs completely. Removing the tee and :writ

Re: Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-31 Thread Kevin Day
> A friend of mine upgraded one of his machines to a duel-cpu > box and upgraded the OS to -STABLE, and he noticed that his > backups were being corrupted. The corruption appears to occur when > he transfers huge gzip'd tar files over a 100BaseTX network: > > I believe that t

Re: Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-29 Thread John S. Dyson
Matthew Dillon said: > > We are attempting to reproduce the problem with a smaller dataset, but > if anyone is hot on the pipe code in the kernel and can give it a > once-over > we may be able to find the bug more quickly. > After a quick code inspection (and I really don't remember

Re: Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-29 Thread Matthew Dillon
Here is a diff of one example of the corruption that is occuring which I believe to be a bug in the pipe device. This diff is out of a multi-hundred-megabyte file: staid# diff t3.cuthex t4.cuthex 86c86 < 0550 f7 7e f4 05 48 2f 28 ef 1f 9b b6 49 5d 76 f5 13 |.~..H/(I]v..| -

Possible race in pipe device driver, esp on multi-cpu machines.

1999-05-29 Thread Matthew Dillon
A friend of mine upgraded one of his machines to a duel-cpu box and upgraded the OS to -STABLE, and he noticed that his backups were being corrupted. The corruption appears to occur when he transfers huge gzip'd tar files over a 100BaseTX network: rsh remote -n "cat remote