Hello. Thanks for the reply. I'm still trying to work out what's going on. It may very well be a memory shortage issue, but I'm thinking it's either some kind of memory fragmentation issue or a network related problem. The issue appears to be triggered when ssh sessions are uncleanly terminated. Specifically, when dangling connections are left hanging by stateful firewalls which timeout between client and server, causing the server side to shutdown uncleanly. What appears to happen is that something gets hung up, a bunch of processes start, things get stuck in fstchg and everything hangs, though the kernel doesn't crash. Sometimes I see proc table full messages, but not always. The next time it happens I'll call fstrans_dump from ddb to see if that yields any results, but right now, I'm at a loss as to which process it is that gets stuck initially, causing the pileup. And, while I am pretty sure I know what triggers the problem, I haven't quite figured out how to reproduce it at will. Anyone seen anything like this? This is on amd64, NetBSD-10.99.12 on a xen VM with 2 processors. I have a bunch of other machines, both VM's and bare metal, running the same code without trouble.
-thanks -Brian On Jun 29, 1:45pm, "J. Hannken-Illjes" wrote: } Subject: Re: Processes getting stuck in "fstchg" with NetBSD-10.99.12/amd6 } } These processes are waiting for a file system suspension. From ddb you may run } } call fstrans_dump(1) } } to dump the current state of the suspension subsystem. You will see which processes / lwps } are "inside" a file system and which file systems are suspending / suspended. } } The syncer (ioflush) waiting is generally bad, is there still free kmem? } } -- } J. Hannken-Illjes - hann...@mailbox.org } >-- End of excerpt from "J. Hannken-Illjes"