These calls are almost all Linux kernel calls. Bug(s) in their kernel PV drivers perhaps.

James
On May 29, 2008, at 9:41 AM, Eric Sproul wrote:

Hi,
After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, the guest seems rather unstable. Yesterday I got two core dumps in /var/xen/ dump, and
noticed in the guest's console a couple of soft lockups:

BUG: soft lockup detected on CPU#1!

Call Trace:
<IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
[<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
[<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
[<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
[<ffffffff80288753>] _local_bh_enable+0x61/0xc5
[<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
[<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
[<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
<EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
[<ffffffff8044936f>] inet6_hash_connect+0xcb/0x2ea
[<ffffffff88115fb6>] :ipv6:tcp_v6_connect+0x530/0x6f6
[<ffffffff802335d9>] lock_sock+0xa7/0xb2
[<ffffffff80258914>] inet_stream_connect+0x94/0x236
[<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
[<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
[<ffffffff803f6405>] sys_connect+0x7e/0xae
[<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
[<ffffffff8025d2f1>] tracesys+0xa7/0xb2

BUG: soft lockup detected on CPU#0!

Call Trace:
<IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
[<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
[<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
[<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
[<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
[<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
[<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
<EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
[<ffffffff8041e10e>] inet_hash_connect+0xc8/0x41c
[<ffffffff80427780>] tcp_v4_connect+0x372/0x69f
[<ffffffff80230882>] sock_recvmsg+0x101/0x120
[<ffffffff88115c4a>] :ipv6:tcp_v6_connect+0x1c4/0x6f6
[<ffffffff80219c31>] vsnprintf+0x559/0x59e
[<ffffffff802335d9>] lock_sock+0xa7/0xb2
[<ffffffff8025b5fe>] cache_alloc_refill+0x13c/0x4ba
[<ffffffff80258914>] inet_stream_connect+0x94/0x236
[<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
[<ffffffff803f6405>] sys_connect+0x7e/0xae
[<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
[<ffffffff8025d2f1>] tracesys+0xa7/0xb2

I've been googling around for answers, but the Redhat bug most frequently linked seems to relate to live migration, which I've not done. This guest was
installed directly using virt-install.

Right now I cannot get into my guest-- it's not responding on the network or the console. A 'virsh shutdown' looks like it worked, but the console remains unresponsive. It looks like it's just spinning, based on the Time value in 'xm
list':

# xm list
Name ID Mem VCPUs State Time(s) Domain-0 0 12051 8 r----- 1497.0 zimbra 5 4096 2 r----- 67931.2

The last time this happened I had to reboot the whole server, which seems
drastic.  Is there a better way to regain control over the guest?

I also need to figure out how to fix the soft lockups. I'm running the latest available mainline CentOS kernel via yum update. My research so far seems to indicate that this occurs when an IRQ takes too long to respond. Maybe I need to pin the guest to particular CPUs, instead of letting dom0 dynamically assign
them?  Any advice in this area would be appreciated.

Thanks,
Eric
_______________________________________________
xen-discuss mailing list
[email protected]

_______________________________________________
xen-discuss mailing list
[email protected]

Reply via email to