On Thu, 22 Mar 2018, Bruce Evans wrote:

On Wed, 21 Mar 2018, Warner Losh wrote:

Log:
 Unlock giant when calling shutdown_nice()
...
This breaks the driver.  Giant is syscons' driver lock, and also the
interrupt handler lock for at least the atkbd keyboard driver, so vt
sometimes holds the lock for.

[That should have been "... lock too".]

...
Actual testing shows that doesn't cause a panic, but it also doesn't
actually unlock for shutdown_nice(), since the lock is acquired twice
and only released once.  syscons has much the same extra lock sprinkling
for event handling as vt:

- intr_event_execute_handlers() acquires Giant and calls atkbdintr()
- atkbdintr() calls sckbdevent()
- sckbdevent() unnecessarily acquires Giant again
- the buggy unlocking drops Giant just once
- shutdown_nice() is called with Giant held
- the buggy unlocking fails to re-acquire Giant
- sckbdevent() releases Giant, leaving it not held
- sckbdevent() returns
- atkbdintr() returns
- intr_event_execute_handlers() releases Giant.  This should panic, but
 it apparently blocks for long enough for init to shut down first.

When I trace the last step, I get a panic which might be either from the
different timing or just a bug in kdb.

Testing with a kernel with other bugs fixed shows that the invariants
violation really is detected, causing a panic instead of a nice shutdown
for Ctrl-Alt-Del, but recursive panics pile up and printf() in -current
is too broken to print anything.  A better kernel prints:

XX ppppppppppppppppppppppppppkernel trap 12 with interrupts disabled
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucrepanic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic

where some messages are lost and some are duplicated (there are 8 CPUs).

I got control by putting a breakpoint after stop_cpus_hard() in vpanic().
(A breakpoint at panic() crashes, probably for multiple CPUs hitting it,
though this is supposed to be fixed in the test version.)  The backtrace
is then:

XX Breakpoint at   vpanic+0x4a:    popl    %ecx
XX db> t
XX Tracing pid 11 tid 100042 td 0xd6a61360
XX vpanic(c0916c91,d6685ab4,d6685ab4,c098f050,c098f040,...) at 
vpanic+0x4a/frame 0xd6685a8c
XX kassert_panic(c0916c91,c093b956,c0930a24,c09066cf,557) at 
kassert_panic+0x49/frame 0xd6685aa8
XX witness_unlock(c098f040,8,c09066c6,557) at witness_unlock+0xe7/frame 
0xd6685af0
XX __mtx_unlock_flags(c098f050,0,c09066c6,557) at __mtx_unlock_flags+0x65/frame 
0xd6685b14
XX intr_event_execute_handlers(c8f466b0) at 
intr_event_execute_handlers+0xed/frame 0xd6685b40
XX ithread_execute_handlers(c8f466b0,0,80202,d6a61360,c8f466b0,...) at 
ithread_execute_handlers+0x21/frame 0xd6685b54
XX ithread_loop(d6cda250,d6685ba8,0,d6cda250,c06a2c56,...) at 
ithread_loop+0x5f/frame 0xd6685b74
XX fork_exit(c06a2c56,d6cda250,d6685ba8) at fork_exit+0x83/frame 0xd6685b94
XX fork_trampoline() at fork_trampoline+0x8

The stack trace is messed up are mis-decoded despite attempts to avoid this
(use i386, don't use clang, and turn off auto-inlining...).   panic() is not
shown, and args passed in registers are not shown.  There are just enough
args to find the panic message.  It is as expected:

XX db> x/s 0xd6685ab4
XX 0xd6685ab4:     V\271\223\300$\012\223\300\317f\220\300W\005
XX db> x/s 0xc0916c91
XX __func__.16466+0x1c41:  lock (%s) %s not locked @ %s:%d

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to