Re: Many processes stuck in zfs

2010-03-27 Thread Stefan Bethke
Am 10.03.2010 um 12:02 schrieb Pawel Jakub Dawidek: > Once the deadlock occur, enter DDB and send me the output of: > > ps > show alllocks > show lockedvnods > show allchains > alltrace panic: deadlkres: possible deadlock detected for 0xff000c66e000, blocked fo

Re: Many processes stuck in zfs

2010-03-12 Thread Alexander Leidinger
Quoting Borja Marcos (from Thu, 11 Mar 2010 18:26:09 +0100): Of course CPUs have bugs, I don't doubt it. I was just wondering how I coud reproduce the problem with a different hardware :) That's why I said it was unlikely. Besides, such a low level fault should produce many more problems

Re: Many processes stuck in zfs

2010-03-11 Thread Borja Marcos
On Mar 11, 2010, at 3:08 PM, Alexander Leidinger wrote: >>> Borja, can you confirm that the CPU is correctly announced in FreeBSD (just >>> look at "dmesg | grep CPU:" output, if it tells you it is a AMD or Intel >>> XXX CPU it is correctly detected by the BIOS)? >> >> A CPU bug? Weird. Very.

Re: Many processes stuck in zfs

2010-03-11 Thread Ivan Voras
On 03/11/10 15:09, Alexander Leidinger wrote: Quoting Ivan Voras (from Thu, 11 Mar 2010 11:59:01 +0100): On 03/11/10 09:54, Borja Marcos wrote: I don't know about the rest but this: CPU: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (2496.25-MHz K8-class CPU) does not agree with this: FreeBSD/S

Re: Many processes stuck in zfs

2010-03-11 Thread Alexander Leidinger
Quoting Ivan Voras (from Thu, 11 Mar 2010 11:59:01 +0100): On 03/11/10 09:54, Borja Marcos wrote: I don't know about the rest but this: CPU: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (2496.25-MHz K8-class CPU) does not agree with this: FreeBSD/SMP: 1 package(s) x 8 core(s) T

Re: Many processes stuck in zfs

2010-03-11 Thread Alexander Leidinger
Quoting Borja Marcos (from Thu, 11 Mar 2010 09:54:47 +0100): On Mar 11, 2010, at 8:45 AM, Alexander Leidinger wrote: Quoting Pawel Jakub Dawidek (from Wed, 10 Mar 2010 18:31:43 +0100): There is a 4th possibility, if you can rule out everything else: bugs in the CPU. I stumbled upon t

Re: Many processes stuck in zfs

2010-03-11 Thread Borja Marcos
On Mar 11, 2010, at 8:45 AM, Alexander Leidinger wrote: > Quoting Pawel Jakub Dawidek (from Wed, 10 Mar 2010 > 18:31:43 +0100): > > There is a 4th possibility, if you can rule out everything else: bugs in the > CPU. I stumbled upon this with ZFS (but UFS was exposing the problem much > faste

Re: Many processes stuck in zfs

2010-03-10 Thread Alexander Leidinger
Quoting Pawel Jakub Dawidek (from Wed, 10 Mar 2010 18:31:43 +0100): On Wed, Mar 10, 2010 at 04:12:36PM +0100, Borja Marcos wrote: On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote: > Once the deadlock occur, enter DDB and send me the output of: > >ps >show alllocks >show

Re: Many processes stuck in zfs

2010-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 10, 2010 at 07:42:43PM +0200, Andriy Gapon wrote: > on 10/03/2010 19:31 Pawel Jakub Dawidek said the following: > > This should be impossible. If we are that deep in zfsvfs_teardown(), it > > means > > that we hold the z_teardown_lock exclusively. And we do as 'show alllocks' > > outpu

Re: Many processes stuck in zfs

2010-03-10 Thread Andriy Gapon
on 10/03/2010 19:31 Pawel Jakub Dawidek said the following: > This should be impossible. If we are that deep in zfsvfs_teardown(), it means > that we hold the z_teardown_lock exclusively. And we do as 'show alllocks' > output confirms. But if we are holding this lock exclusively we shouldn't be > t

Re: Many processes stuck in zfs

2010-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 10, 2010 at 04:12:36PM +0100, Borja Marcos wrote: > > On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote: > > > Once the deadlock occur, enter DDB and send me the output of: > > > > ps > > show alllocks > > show lockedvnods > > show allchains > >

Re: Many processes stuck in zfs

2010-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 10, 2010 at 01:32:02PM +0100, Borja Marcos wrote: > Trying. I started my typical test: > > Machine 1 doing a make buildworld on a dataset with src and obj on it. > > Machine 1 replicating incremental snapshots of the dataset to machine 2. > > Machine 2 running some "tar cf - . | ( cd

Re: Many processes stuck in zfs

2010-03-10 Thread Ollivier Robert
According to Stefan Bethke: > $ sysctl kern.maxvnodes vfs.numvnodes vfs.freevnodes > kern.maxvnodes: 10 > vfs.numvnodes: 87681 > vfs.freevnodes: 7600 > > Is there a rule of thumb what maxvnodes should be tuned to? Not sure, I max'ed it to 20 and the machine has not locked up since. Try

Re: Many processes stuck in zfs

2010-03-10 Thread Borja Marcos
On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote: > On Wed, Mar 10, 2010 at 10:24:49AM +0100, Borja Marcos wrote: >> Tested. Same deadlock remains. > > Ok, to track this down I need the following: > > Uncomment 'CFLAGS+=-DDEBUG=1' line in sys/modules/zfs/Makefile. > > Add the fo

Re: Many processes stuck in zfs

2010-03-10 Thread Stefan Bethke
Am 10.03.2010 um 12:35 schrieb Ollivier Robert: > According to Stefan Bethke: >> The situation seems to be triggered by zfs receive'ing snapshots from the >> sister machine (both synchronize their active ZFS filesystems to each other, >> using zfs send and zfs receive). It appears it's the rece

Re: Many processes stuck in zfs

2010-03-10 Thread Ollivier Robert
According to Stefan Bethke: > The situation seems to be triggered by zfs receive'ing snapshots from the > sister machine (both synchronize their active ZFS filesystems to each other, > using zfs send and zfs receive). It appears it's the receiving causing > trouble. Have you tuned kern.maxvnod

Re: Many processes stuck in zfs

2010-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 10, 2010 at 10:24:49AM +0100, Borja Marcos wrote: > Tested. Same deadlock remains. Ok, to track this down I need the following: Uncomment 'CFLAGS+=-DDEBUG=1' line in sys/modules/zfs/Makefile. Add the following lines to your kernel config: options WITNESS opti

Re: Many processes stuck in zfs

2010-03-10 Thread Borja Marcos
On Mar 9, 2010, at 3:18 PM, Borja Marcos wrote: > > On Mar 9, 2010, at 1:58 PM, Pawel Jakub Dawidek wrote: > What kind of hardware do you have there? There is 3-way deadlock I've a fix for which would be hard to trigger on single or dual core machines. Feel free to try the

Re: Many processes stuck in zfs

2010-03-09 Thread Kevin Oberman
Sigh. My brain is fried. I replied to the wrong thread. Pleas ignore this. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: ober...@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 0

Re: Many processes stuck in zfs

2010-03-09 Thread Kevin Oberman
> Date: Tue, 9 Mar 2010 21:53:55 +1100 > From: Peter Jeremy > Sender: owner-freebsd-sta...@freebsd.org > > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: > >Over the past couple of months, I've more or less regularly observed > >machines having more and more processes stuck in the zfs wcha

Re: Many processes stuck in zfs

2010-03-09 Thread Borja Marcos
On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from

Re: Many processes stuck in zfs

2010-03-09 Thread Pawel Jakub Dawidek
On Tue, Mar 09, 2010 at 01:57:07PM +0100, Borja Marcos wrote: > > On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: > > > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > >> Over the past couple of months, I've more or less regularly observed > >> machines having more and mo

Re: Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke
Am 09.03.2010 um 13:29 schrieb Pawel Jakub Dawidek: > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from tha

Re: Many processes stuck in zfs

2010-03-09 Thread Pawel Jakub Dawidek
On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > Over the past couple of months, I've more or less regularly observed machines > having more and more processes stuck in the zfs wchan. The processes never > recover from that, and trying to reboot only gets the entire system stuck,

Re: Many processes stuck in zfs

2010-03-09 Thread Borja Marcos
On Mar 9, 2010, at 1:58 PM, Pawel Jakub Dawidek wrote: >>> What kind of hardware do you have there? There is 3-way deadlock I've a >>> fix for which would be hard to trigger on single or dual core machines. >>> >>> Feel free to try the fix: >>> >>> http://people.freebsd.org/~pjd/patches/zfs

Re: Many processes stuck in zfs

2010-03-09 Thread Peter Jeremy
On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: >Over the past couple of months, I've more or less regularly observed machines >having more and more processes stuck in the zfs wchan. The processes never >recover from that, How long have you waited? There seems to be a problem with low fre

Re: Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke
Am 09.03.2010 um 11:53 schrieb Peter Jeremy: > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from that, > > How long

Re: Many processes stuck in zfs

2010-03-09 Thread Frédéric Bour
Le Tue, 9 Mar 2010 10:15:53 +0100, Stefan Bethke a écrit : > Over the past couple of months, I've more or less regularly observed > machines having more and more processes stuck in the zfs wchan. The > processes never recover from that, and trying to reboot only gets the > entire system stuck, w

Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke
Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan. The processes never recover from that, and trying to reboot only gets the entire system stuck, without any console messages. I can enter the debugger, and I have