Hello.  Thanks for the reply.  I'm still trying to work out what's 
going on.  It may very
well be a memory shortage issue, but I'm thinking it's either some kind of 
memory fragmentation
issue or a network related problem.  The issue appears to be triggered when ssh 
sessions are
uncleanly terminated.  Specifically, when dangling connections are left hanging 
by stateful
firewalls which timeout between client and server, causing the server side to 
shutdown
uncleanly.  What appears to happen is that something gets hung up, a bunch of 
processes start,
things get stuck in fstchg and everything hangs, though the kernel doesn't 
crash.  Sometimes I
see proc table full messages, but not always.
The next time it happens I'll call fstrans_dump from ddb to see if that yields 
any results, but
right now, I'm at a loss as to which process it is that gets stuck initially, 
causing the
pileup.  And, while I am pretty sure I know what triggers the problem, I 
haven't quite figured
out how to reproduce it at will.
Anyone seen anything like this?  
This is on amd64, NetBSD-10.99.12 on a xen VM with 2 processors.
I have a bunch of other machines, both VM's and bare metal, running the same 
code without
trouble.

-thanks
-Brian
On Jun 29,  1:45pm, "J. Hannken-Illjes" wrote:
} Subject: Re: Processes getting stuck in "fstchg" with NetBSD-10.99.12/amd6
} 
} These processes are waiting for a file system suspension.  From ddb you may 
run
} 
}       call fstrans_dump(1)
} 
} to dump the current state of the suspension subsystem.  You will see which 
processes / lwps
} are "inside" a file system and which file systems are suspending / suspended.
} 
} The syncer (ioflush) waiting is generally bad, is there still free kmem?
} 
} --
} J. Hannken-Illjes - hann...@mailbox.org
} 
>-- End of excerpt from "J. Hannken-Illjes"


Reply via email to