This one happened today:

Gentoo linux dom0, xen 3.2.0 hypervisor (32-bit), Intel Quad-core cpu, 8GB 
memory.
This linux box / dom0  is up for 40 days.

OpenSolaris SXCE snv_81 PV domU (32-bit)


Problem: when starting the snv_81 domU, we're stuck for more than 60 seconds
early during bootstrap:

v3.2.0 chgset 'unavailable'
SunOS Release 5.11 Version snv_81 32-bit
Copyright 1983-2007 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
WARNING: High-overhead kmem debugging features enabled (kmem_flags = 0xf).  
Performance degradation and large memory overhead possible. See the Solaris 
Tunable Parameters Reference Manual.
features: 
31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc>
mem = 524288K (0x20000000)
root nexus = i86xpv
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
<<<<< stops here, no progress for more than 60 seconds



After some long delay, booting continues.  Well, actually, the box had
set snooping=1 in /etc/system, which resulted in a deadman panic
(because the default snooping_interval is 50 seconds).
Without snooping, the box continues booting after the long delay.


The delay can be seen in /var/adm/message (note the 19:55:33 -> 19:56:46 jump):


Jun 18 19:55:33 solaris-domu genunix: [ID 540533 kern.notice] ^MSunOS Release 
5.11 Version snv_81 32-bit
Jun 18 19:55:33 solaris-domu genunix: [ID 943907 kern.notice] Copyright 
1983-2007 Sun Microsystems, Inc.  All rights reserved.
Jun 18 19:55:33 solaris-domu Use is subject to license terms.
Jun 18 19:55:33 solaris-domu genunix: [ID 289235 kern.warning] WARNING: 
High-overhead kmem debugging features enabled (kmem_flags = 0xf).  Performance 
degradation and large memory overhead possible. See the Solaris Tunable 
Parameters Reference Manual.
Jun 18 19:55:33 solaris-domu unix: [ID 126719 kern.notice] features: 
31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc>
Jun 18 19:55:33 solaris-domu unix: [ID 168242 kern.notice] mem = 524288K 
(0x20000000)
Jun 18 19:55:33 solaris-domu unix: [ID 508459 kern.warning] WARNING: Cannot 
loadpsm pcplusmp
Jun 18 19:55:33 solaris-domu rootnex: [ID 466748 kern.notice] root nexus = 
i86xpv
Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] pseudo0 at root
Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] pseudo0 is /pseudo
Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] scsi_vhci0 at root
Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] scsi_vhci0 is 
/scsi_vhci
Jun 18 19:55:33 solaris-domu genunix: [ID 765264 kern.info] NOTICE: Time of Day 
clock is Read-Only; set of Date/Time will not persist across reboot.
Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot 
load platform pm driver acpippm
Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot 
load platform pm driver ppm
Jun 18 19:56:46 solaris-domu rootnex: [ID 349649 kern.notice] xpvd0 at root
Jun 18 19:56:46 solaris-domu xpvd: [ID 395608 kern.notice] [EMAIL PROTECTED], 
xdf0


Problem appears to be in cyclic_timer:

   571  void
   572  cyclic_timer(void)
   573  {
   574          tm_req_t *req;
   575          timer_tw_t *tw;
   576          hrtime_t curr_tick, curr;
   577
   578          /* If the system is suspended, just return */
   579          if (timer_suspended)
   580                  return;
   581
   582          /* Get the current time */
   583          timer_hrtime = ddi_timer->tick_time = curr = gethrtime();

gethrtime returned 3468552305121315; dividing by nanoseconds and the number
of seconds in a day:

3468552305121315/10^9/(24*60*60)
40.145

(dom0 is up for 40.145 days)

   584          curr_tick = tw_tick(ddi_timer->tick_time);

tw_tick returns 2774841844.


And we have ddi_timer->tick == 0

[0]> ddi_timer::print [0].tick
[0].tick = 0


Now the following while loop seems to loop 2774841844 times; somehow
that just seems to waste quite a lot of time, and neither
timeout_dispatch() nor transfer_req() get called.

   586  restart:
   587          /*
   588           * Check the timer cogs to see if there are timeout requests
   589           * who reach the expiration time. Call timeout_invoke() to 
execute
   590           * the requests, then.
   591           */
   592          while (curr_tick >= ddi_timer->tick) {
   ...
   654                  ddi_timer->tick++;
   655          }


I suspect that on metal this is no problem, because the returned
value from gethrtime() is small.
 
 
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]

Reply via email to