Hello, for some time we observed several host where xenstored crashes. We observed the following crash two times by now:
> #0 talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116 > 116 if ((tc->flags & ~0xF) != TALLOC_MAGIC) { > warning: not using untrusted file > "/root/xen-4.1-4.1.3/xen-4.1.3/tools/xenstore/.gdbinit" > (gdb) bt > #0 talloc_chunk_from_ptr (ptr=0xff0000000000) at talloc.c:116 > #1 0x0000000000407edf in talloc_free (ptr=0xff0000000000) at talloc.c:551 > #2 0x000000000040a348 in tdb_open_ex (name=0x167d620 > "/var/lib/xenstored/tdb.0x16a48b0", > hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized > out>, mode=<value optimized out>, > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at > tdb.c:1958 > #3 0x000000000040a684 in tdb_open (name=0xff0000000000 <Address > 0xff0000000000 > out of bounds>, hash_size=0, > tdb_flags=4254928, open_flags=-1, mode=3974450184) at tdb.c:1773 > #4 0x000000000040a70b in tdb_copy (tdb=0x16c9040, outfile=0x167d620 > "/var/lib/xenstored/tdb.0x16a48b0") > at tdb.c:2124 > #5 0x0000000000406c2d in do_transaction_start (conn=0x167e310, in=<value > optimized out>) > at xenstored_transaction.c:164 > #6 0x00000000004045ca in process_message (conn=0x167e310) at > xenstored_core.c:1214 > #7 consider_message (conn=0x167e310) at xenstored_core.c:1261 > #8 handle_input (conn=0x167e310) at xenstored_core.c:1308 > #9 0x0000000000405170 in main (argc=<value optimized out>, argv=<value > optimized out>) at xenstored_core.c:1964 > (gdb) frame 2 > #2 0x000000000040a348 in tdb_open_ex (name=0x167d620 > "/var/lib/xenstored/tdb.0x16a48b0", > hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized > out>, mode=<value optimized out>, > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at > tdb.c:1958 > 1958 SAFE_FREE(tdb->locked); > (gdb) print tdb->locked > $3 = (struct tdb_lock_type *) 0xff0000000000 Another one was in vsprintf() - see <https://forge.univention.org/bugzilla/show_bug.cgi?id=35104#c3> for the full back traces. To me this looks like some memory corruption by some unknown code writing into some random memory space, which happens to be the tdb here. As far as I know xenstored can't be restarted as - for example - qemu-dm and blktap2 processes have open file handles to the xenstored unix socket for IPC, which would need re-opening. As such the host must be rebooted to fix this situation, as the VMs can no longer be managed and thus not migrated. The host is still running xen-4.1.3 (I know that this is quiet old), but I had a look at the changes between that version and master for tools/xenstore/ myself and didn't see any obvious change which could fix that. 1. Has someone observed a similar crash? 2. We've now also enabled "xenstored -T /log --verbose" to log the messages in the hope to find the triggering transaction, but until then is there something more we can do to track down the problem? 3. the crash happens rarely and the host run fine most of the time. The crash mostly happens around midnight and seem to be guest-triggered, as the logs on the host don't show any activity like starting new or destroying running VMs. So far the problem only showed on host running Linux VMs. Other host running Windows VMs so far never showed that crash. Thank you for your support. Philipp -- Philipp Hahn Open Source Software Engineer Univention GmbH be open. Mary-Somerville-Str. 1 D-28359 Bremen Tel.: +49 421 22232-0 Fax : +49 421 22232-99 h...@univention.de http://www.univention.de/ Geschäftsführer: Peter H. Ganten HRB 20755 Amtsgericht Bremen Steuer-Nr.: 71-597-02876 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel