David wrote:
> On Wed, Jan 30, 2008 at 08:35:13AM -0800, Jürgen Keil wrote:
> > Can anyone else reproduce opensolaris PV domU hangs
> > during domU boot, when the domU is using an root filesystem
> > on an nfs server and the domU is configured with more than
> > one vcpu?
>
> I don't have this set up at the moment, but it definitely worked ~6
> months ago (even 32 way).
Hmm, this could be a generic S-x86 mp architecture bug (?).
Under xVM, this happens:
- mp_startup() is called to startup cpu#1
- in mp_startup(), "(*ap_mlsetup)()" is called,
which calls xen's xen_psm_post_cpu_start()
- in xen_psm_post_cpu_start() we have this:
/*
* Re-distribute interrupts to include the newly added cpu.
*/
xen_psm_enable_intr(cpun);
In my setup, this re-binds netfront's interrupt handler
xnf`xnf_intr() from cpu0 to the new cpu1.
(This might have changed in snv_77, with the fix for
6611846 "after boot, all dom0 interrupts are targeting
CPU 0 in a MP system" - this could explain why it
did work for you ~6 month ago).
- later on, in mp_setup() it raises the spl for the new cpu1
to LOCK_LEVEL, and enables interrupts. But at
spl == LOCK_LEVEL, xnf_intr should be masked.
add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi)
is called. This tries to load & attach the "cpudrv" kernel
module (while we're still at spl == LOCK_LEVEL on cpu1).
It sends packes out of the domU, but the replies from
the NFS server are never seen by xnf`xnf_intr, which is
masked.
When the domU is hung, it see this:
[1]> ::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
0 fffffffffbc3fff0 1b 0 0 -1 no no t-0 ffffff0001005c80
(idle)
|
RUNNING <--+
READY
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
1 ffffff0086199ac0 1b 0 10 60 no no t-0 ffffff00010cbc80
|
RUNNING <--+
READY
EXISTS
ENABLE
[1]> ::interrupts
IRQ Vect Evtchn IPL Bus Trg Type CPU Share APIC/INT# ISR(s)
256 - I 15 - Edg ipi all - - xc_serv
257 - I 13 - Edg ipi all - - xc_serv
258 - I 11 - Edg ipi all - - poke_cpu
259 - 1 15 - Edg virq all - - xen_debug_handler
260 - 1 1 - Edg evtchn 0 - - xenbus_intr
261 - T 14 - Edg virq all - - cbe_fire
262 - I 14 - Edg ipi all - - cbe_fire
263 - 9 6 xpvd Edg evtchn 1 - - xnf`xnf_intr
264 - 2 9 xpvd Edg evtchn 0 - - xencons`xenconsintr
[1]> ::evtchns
Type Evtchn IRQ IPL CPU Masked Pending ISR(s)
evtchn 1 260 1 0 0 0 xenbus_intr
evtchn 2 264 9 0 0 1 xencons`xenconsintr
ipi 3 256 15 0 1 0 xc_serv
ipi 4 257 13 0 0 0 xc_serv
ipi 5 258 11 0 0 0 poke_cpu
virq:debug 6 259 15 0 0 0 xen_debug_handler
virq:timer 7 261 14 0 1 1 cbe_fire
ipi 8 262 14 0 0 0 cbe_fire
evtchn 9 263 6 1 1 1 xnf`xnf_intr
ipi 10 258 11 1 0 0 poke_cpu
ipi 11 257 13 1 0 0 xc_serv
ipi 12 262 14 1 0 0 cbe_fire
ipi 13 256 15 1 0 0 xc_serv
virq:timer 14 261 14 1 1 1 cbe_fire
A possible fix could be to move the add_cpunode2devtree()
call down a few lines in mp_startup(), after the spl0():
diff -r f6814e9b7def usr/src/uts/i86pc/os/mp_startup.c
--- a/usr/src/uts/i86pc/os/mp_startup.c Wed Jan 30 09:01:17 2008 -0800
+++ b/usr/src/uts/i86pc/os/mp_startup.c Thu Jan 31 01:00:58 2008 +0100
@@ -1518,13 +1518,15 @@ mp_startup(void)
*/
curthread->t_preempt = 0;
- add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi);
+ /* add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi); */
/* The base spl should still be at LOCK LEVEL here */
ASSERT(cp->cpu_base_spl == ipltospl(LOCK_LEVEL));
set_base_spl(); /* Restore the spl to its proper value */
(void) spl0(); /* enable interrupts */
+
+ add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi);
#ifndef __xpv
{
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]