On Wed, 5 Oct 2016, Slawa Olhovchenkov wrote:

On Wed, Oct 05, 2016 at 11:19:10AM +1100, Bruce Evans wrote:

On Tue, 4 Oct 2016, Gleb Smirnoff wrote:

On Mon, Sep 26, 2016 at 03:30:30PM +0000, Eric van Gyzen wrote:
E> ...
E> Modified: head/sys/kern/kern_mutex.c
E> 
==============================================================================
E> --- head/sys/kern/kern_mutex.c    Mon Sep 26 15:03:31 2016        (r306345)
E> +++ head/sys/kern/kern_mutex.c    Mon Sep 26 15:30:30 2016        (r306346)
E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c
E>  {
E>   const struct mtx *m;
E>
E> - if (panicstr != NULL || dumping)
E> + if (panicstr != NULL || dumping || SCHEDULER_STOPPED())
E>           return;

I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()?
Positive panicstr and dumping imply scheduler stopped.

'dumping' doesn't imply SCHEDULER_STOPPED().

Checking 'dumping' here seems to be just an old bug.  It just breaks
__mtx_assert(), while all other mutex operations work normally for dumping
without panicing.

[...]

Is this related to halted (not reboted) 11.0 after ~^B and `panic`?

There might be related problems, but I don't see any here.

What I see on serial console:
=====
db> panic
panic: from debugger

I wouldn't trust panic from the debugger, but it is safer than dump
from the debugger (both are ddb commands, but this is another bug).

cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8031fadb = 
db_trace_self_wrapper+0x2b/frame 0xfffffe1f9e198120
vpanic() at 0xffffffff804a0302 = vpanic+0x182/frame 0xfffffe1f9e1981a0
panic() at 0xffffffff804a0383 = panic+0x43/frame 0xfffffe1f9e198200
db_panic() at 0xffffffff8031d987 = db_panic+0x17/frame 0xfffffe1f9e198210
db_command() at 0xffffffff8031d019 = db_command+0x299/frame 0xfffffe1f9e1982e0
db_command_loop() at 0xffffffff8031cd74 = db_command_loop+0x64/frame 
0xfffffe1f9e1982f0
db_trap() at 0xffffffff8031fc1b = db_trap+0xdb/frame 0xfffffe1f9e198380
kdb_trap() at 0xffffffff804dd8c3 = kdb_trap+0x193/frame 0xfffffe1f9e198410
trap() at 0xffffffff806e3065 = trap+0x255/frame 0xfffffe1f9e198620
calltrap() at 0xffffffff806cafd1 = calltrap+0x8/frame 0xfffffe1f9e198620
--- trap 0x3, rip = 0xffffffff804dd11e, rsp = 0xfffffe1f9e1986f0, rbp = 
0xfffffe1f9e198710 ---
kdb_alt_break_internal() at 0xffffffff804dd11e = 
kdb_alt_break_internal+0x18e/frame 0xfffffe1f9e198710
kdb_alt_break() at 0xffffffff804dcf8b = kdb_alt_break+0xb/frame 
0xfffffe1f9e198720
uart_intr_rxready() at 0xffffffff803e38a8 = uart_intr_rxready+0x98/frame 
0xfffffe1f9e198750
uart_intr() at 0xffffffff803e4621 = uart_intr+0x121/frame 0xfffffe1f9e198790
intr_event_handle() at 0xffffffff8046c74b = intr_event_handle+0x9b/frame 
0xfffffe1f9e1987e0
intr_execute_handlers() at 0xffffffff8076d2d8 = 
intr_execute_handlers+0x48/frame 0xfffffe1f9e198810
lapic_handle_intr() at 0xffffffff8077163f = lapic_handle_intr+0x3f/frame 
0xfffffe1f9e198830
Xapic_isr1() at 0xffffffff806cb6b7 = Xapic_isr1+0xb7/frame 0xfffffe1f9e198830
--- interrupt, rip = 0xffffffff8032fedf, rsp = 0xfffffe1f9e198900, rbp = 
0xfffffe1f9e198940 ---
acpi_cpu_idle() at 0xffffffff8032fedf = acpi_cpu_idle+0x2af/frame 
0xfffffe1f9e198940
cpu_idle_acpi() at 0xffffffff8076ad1f = cpu_idle_acpi+0x3f/frame 
0xfffffe1f9e198960
cpu_idle() at 0xffffffff8076adc5 = cpu_idle+0x95/frame 0xfffffe1f9e198980
sched_idletd() at 0xffffffff804cbbe5 = sched_idletd+0x495/frame 
0xfffffe1f9e198a70
fork_exit() at 0xffffffff8046a211 = fork_exit+0x71/frame 0xfffffe1f9e198ab0
fork_trampoline() at 0xffffffff806cb50e = fork_trampoline+0xe/frame 
0xfffffe1f9e198ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

This looks like a normal kdb entry then a not so normal panic from ddb,
but no problems.

Uptime: 1d4h53m19s
Dumping 12148 out of 131020 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
mps2: Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 12
mps2: Incrementing SSU count
mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
mps2: Incrementing SSU count
=====

This is normal reboot (by /sbin/reboot):

Is the above just a hung dump from reboot, before going near ddb?  That
case should work, but perhaps it needs to be more careful about waiting
for the other CPUs.  Just stopping them is no good since it gives an
even more fragile environment, like panicing or entering ddb.


===
Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 13
mps2: Incrementing SSU count
mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
mps2: Incrementing SSU count
mps2: Decrementing SSU count.
mps2: Completing stop unit for (xpt0:mps2:0:18:ffffffff):
mps2: Decrementing SSU count.
mps2: Completing stop unit for (xpt0:mps2:0:14:ffffffff):
===

====
mps2: lagg0: link state changed to DOWN
Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 12
mps2: Incrementing SSU count
mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
mps2: Incrementing SSU count
mps2: Decrementing SSU count.
mps2: Completing stop unit for (xpt0:mps2:0:18:ffffffff):
mps2: Decrementing SSU count.
mps2: Completing stop unit for (xpt0:mps2:0:14:ffffffff):
====

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to