Hello,

for some time we observed several host where xenstored crashes. We
observed the following crash two times by now:

> #0  talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116
> 116             if ((tc->flags & ~0xF) != TALLOC_MAGIC) { 
> warning: not using untrusted file
> "/root/xen-4.1-4.1.3/xen-4.1.3/tools/xenstore/.gdbinit"
> (gdb) bt
> #0  talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116
> #1  0x0000000000407edf in talloc_free (ptr=0xff0000000000) at talloc.c:551
> #2  0x000000000040a348 in tdb_open_ex (name=0x167d620
> "/var/lib/xenstored/tdb.0x16a48b0", 
>     hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized
> out>, mode=<value optimized out>, 
>     log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at 
> tdb.c:1958
> #3  0x000000000040a684 in tdb_open (name=0xff0000000000 <Address 
> 0xff0000000000
> out of bounds>, hash_size=0, 
>     tdb_flags=4254928, open_flags=-1, mode=3974450184) at tdb.c:1773
> #4  0x000000000040a70b in tdb_copy (tdb=0x16c9040, outfile=0x167d620
> "/var/lib/xenstored/tdb.0x16a48b0")
>     at tdb.c:2124
> #5  0x0000000000406c2d in do_transaction_start (conn=0x167e310, in=<value
> optimized out>)
>     at xenstored_transaction.c:164
> #6  0x00000000004045ca in process_message (conn=0x167e310) at
> xenstored_core.c:1214
> #7  consider_message (conn=0x167e310) at xenstored_core.c:1261
> #8  handle_input (conn=0x167e310) at xenstored_core.c:1308
> #9  0x0000000000405170 in main (argc=<value optimized out>, argv=<value
> optimized out>) at xenstored_core.c:1964

> (gdb) frame 2
> #2  0x000000000040a348 in tdb_open_ex (name=0x167d620 
> "/var/lib/xenstored/tdb.0x16a48b0", 
>     hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized 
> out>, mode=<value optimized out>, 
>     log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at 
> tdb.c:1958
> 1958            SAFE_FREE(tdb->locked);
> (gdb) print tdb->locked
> $3 = (struct tdb_lock_type *) 0xff0000000000

Another one was in vsprintf() - see
<https://forge.univention.org/bugzilla/show_bug.cgi?id=35104#c3> for the
full back traces.

To me this looks like some memory corruption by some unknown code
writing into some random memory space, which happens to be the tdb here.

As far as I know xenstored can't be restarted as - for example - qemu-dm
and blktap2 processes have open file handles to the xenstored unix
socket for IPC, which would need re-opening. As such the host must be
rebooted to fix this situation, as the VMs can no longer be managed and
thus not migrated.

The host is still running xen-4.1.3 (I know that this is quiet old), but
I had a look at the changes between that version and master for
tools/xenstore/ myself and didn't see any obvious change which could fix
that.

1. Has someone observed a similar crash?

2. We've now also enabled "xenstored -T /log --verbose" to log the
messages in the hope to find the triggering transaction, but until then
is there something more we can do to track down the problem?

3. the crash happens rarely and the host run fine most of the time. The
crash mostly happens around midnight and seem to be guest-triggered, as
the logs on the host don't show any activity like starting new or
destroying running VMs. So far the problem only showed on host running
Linux VMs. Other host running Windows VMs so far never showed that crash.

Thank you for your support.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to