hello. Follwoing up on this thread, I have more data, and questions.
First, While I can't 100% reliably reproduce the issue, I can achieve
the state fairly
reliably by closing ssh sessions into the affected machine when connected
through a particular
Juniper firewall on our network. What appears to happen is I close the session
and one of my
csh processes gets stuck in specio wait, causing the root filesystem to be in
suspended state.
Then, cron starts firing off jobs, each of which gets stuck in fstchg state
until the process
table gets full.
Using ddb, I was able to gather the below information. I have more data than
is shown here, but
I don't have a full crash dump.
Runing call fstrans_dump(1) I see:
[ 306390.6288439] Fstrans state by mount:
[ 306390.6288439] / owner 0xffffa6c9fb1c1c00 state suspended
Then,
17174 17174 3 1 0 ffffa6c9fb1c1c00 csh specio
Then,
[ 306390.6288439] 17174.17 @0xffffa6ca2639e400 (/) shared 2 cow 0 alias 0
Questions:
I'm assuming it's bad to have the / filesystem be in suspended state?
What does the 2 represent after the word shared in the
previous line?
Assuming I can get another crash, what details should I gather beyond
these details the
next time?
-thanks
-Brian