When trying to boot a Solaris Dom0 kernel on a Tecra S1 laptop (Pentium-M cpu;
no PAE available; and the hardware design seems to force using the old
legacy 8259 PICs), using a xen hypervisor compiled with PAE disabled,
the Solaris x86 kernel crashes somewhere inside mach_init() with a
BAD TRAP (#pf page fault) with pc=0 and address 0xf000e6f2.
Crash happens in usr/src/uts/i86pc/os/mp_machdep.c at line 617,
because "pops->psm_softinit == NULL".
612 if (pops->psm_notify_error) {
613 psm_notify_error = mach_notify_error;
614 notify_error = pops->psm_notify_error;
615 }
616
617 (*pops->psm_softinit)();
Cause appears to be in mach_construct_info(), which does not
initialize mach_set[PSM_OWN_SYS_DEFAULT]. When we call
mach_get_platform(PSM_OWN_SYS_DEFAULT) at line 541, the mach_ver[]
array contains
Index 0: pointer to mach_ops
Index 1-3: NULL
(that is, mach_set[PSM_OWN_SYS_DEFAULT] was NULL)
522 static void
523 mach_construct_info()
524 {
525 struct psm_sw *swp;
526 int mach_cnt[PSM_OWN_OVERRIDE+1] = {0};
527 int conflict_owner = 0;
528
529 if (psmsw->psw_forw == psmsw)
530 panic("No valid PSM modules found");
531 mutex_enter(&psmsw_lock);
532 for (swp = psmsw->psw_forw; swp != psmsw; swp = swp->psw_forw) {
533 if (!(swp->psw_flag & PSM_MOD_IDENTIFY))
534 continue;
535 mach_set[swp->psw_infop->p_owner] = swp->psw_infop->p_op
s;
536 mach_ver[swp->psw_infop->p_owner] = swp->psw_infop->p_ve
rsion;
537 mach_cnt[swp->psw_infop->p_owner]++;
538 }
539 mutex_exit(&psmsw_lock);
540
541 mach_get_platform(PSM_OWN_SYS_DEFAULT);
Apparently the xen platform module hasn't set the bit "swp->psw_flag &
PSM_MOD_IDENTIFY" so the code at lines 535 - 537 was skipped.
(swp->psw_flag for the xen platform module had a value of 1
== PSM_MOD_INSTALL).
When mach_get_platform(PSM_OWN_SYS_DEFAULT) is called, random data is
copied from address 0 into the mach_ops array (on the Tecra S1,
mach_ops.psm_softinit remains set to NULL).
Problem #1: the code shouldn't crash like this; I expect some kind
of error message, why the xen platform module has failed
==> Apparently the code assumes a "PSM_OWN_SYS_DEFAULT" psm module
never fails the probe, that is, the "PSM_OWN_SYS_DEFAULT" psm module
is always usable. This isn't the case for xpv_psm, it fails psm_probe()
on uppc machines.
==> shouldn't we have a xpv_uppc_psm (PSM_OWN_SYS_DEFAULT) &
xpv_pcplusmp_psm (PSM_OWN_EXCLUSIVE) module;
xpv_uppc_psm is always available, and xpv_pcplusmp_psm only on machines
with APIC ?
Problem #2: why did the xen platform module fail to probe?
usr/src/uts/i86pc/os/mp_implfuncs.c: line 409
397 void
398 psm_install(void)
399 {
400 struct psm_sw *swp, *cswp;
401 struct psm_ops *opsp;
402 char machstring[15];
403 int err;
404
405 mutex_enter(&psmsw_lock);
406 for (swp = psmsw->psw_forw; swp != psmsw; ) {
407 opsp = swp->psw_infop->p_ops;
408 if (opsp->psm_probe) {
409 if ((*opsp->psm_probe)() == PSM_SUCCESS) {
410 swp->psw_flag |= PSM_MOD_IDENTIFY;
411 swp = swp->psw_forw;
412 continue;
413 }
414 }
Root cause is that xpv_psm`xen_psm_probe() tries to probe the apic
usign apic_probe_common(), but the Tecra S1 doesn't have the apic
enabled, so apic_probe_common() returns -1
(When booting standard Solaris-x86, the uppc psm module is used)
=================
The hypervisor seems to have partial(?) / full(?) support for such uppc
systems. Shouldn't the Solaris i86xpv platform code support such a
system, too?
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]