[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-12-15 Thread Thomas Huth
** Changed in: qemu Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10 Status in QEMU: Fix

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-09-02 Thread Launchpad Bug Tracker
This bug was fixed in the package qemu - 1:2.10~rc4+dfsg-0ubuntu1 --- qemu (1:2.10~rc4+dfsg-0ubuntu1) artful; urgency=medium * Merge with Upstream 2.10-rc4; This fixes a migration issue (LP: #1711602); Remaining changes: - qemu-kvm to systemd unit - d/qemu-kvm-init:

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
Yes, with all the series of [1] on top it finally works. Saw it already being merged on master. Expecting a late rc4 or early release tag and then wrap all it up. Thanks everybody involved! [1]: http://lists.nongnu.org/archive/html/qemu- devel/2017-08/msg04513.html ** Changed in: qemu

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
Yeah seems to be slightly different than the former assert. 2017-08-23 18:41:54.556+: initiating migration bdrv_inactivate_recurse: entry for drive-virtio-disk0 bdrv_inactivate_recurse: entry for #block133 bdrv_inactivate_recurse: entry for #block329 bdrv_inactivate_recurse: entry for

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Dr. David Alan Gilbert
just tested current head - 1eed33994e28d4a0437ba6e944bbc3ec5e4a29a0 - seems to work for me. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Dr. David Alan Gilbert
Looks good here, just retested: here's teh top of my git: f89f59fad5119f878aaedf711af90802ddcb99c7 nbd-client: avoid spurious qio_channel_yield() re-entry cf26039a2b50f078b4ad90b88eea5bb28971c0d8 block: Update open_flags after ->inactivate() callback 8ccc527d84ec9a5052cfae19edbc44abb5ac03ae

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Dr. David Alan Gilbert
I need to recheck with that combo - I'd seen that error but only when I'd commented out 'if (!blk->dev && !blk_name(blk)[0]) {' when debugging earlier. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU.

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
That was rc3 +: - nbd-client-avoid-spurious-qio_channel_yield.patch - the four patches mentioned in comment #43 I could also re-base onto master + pacthes or rc4 if there is one soon. For now building with Davids debug statements applied again to check if we still abort around that assert. --

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
Hmm, it gets further but can still not complete this kind of migration: $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.30/system Source: 2017-08-23 16:49:23.022+: initiating migration Unexpected error in bdrv_check_perm() at

Re: [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Eric Blake
On 08/23/2017 09:55 AM, ChristianEhrhardt wrote: > Ok, clarified with Stefanha > It has exactly the same title as a series of 18th August which was related to > a similar issue. > It is about an hour old now on qemu-devel, quoting > > "This fixes the issue reported as >

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
Hi Stefan, I was part of the report around the series in "[PATCH for-2.10 0/4] block: Fix non-shared storage migration", but this is happening on rc3 which contains this. AFAIK Fam's series is: dd7fdaad iotests: Add non-shared storage migration case 192 (Fam) 5f7772c4 block-backend: Defer

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread ChristianEhrhardt
Ok, clarified with Stefanha It has exactly the same title as a series of 18th August which was related to a similar issue. It is about an hour old now on qemu-devel, quoting "This fixes the issue reported as https://bugs.launchpad.net/bugs/1711602 Fam Zheng (3): block-backend: Refactor

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Dr. David Alan Gilbert
yes, seems to fix it for me. Thanks Christian for filing this; we probably wouldn't have spotted it before the release without it (which the test Stefan has just added will hopefully cure!). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Stefan Hajnoczi
Please see Fam's patch series "[PATCH for-2.10 0/4] block: Fix non- shared storage migration" that fixes this issue. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-23 Thread Dr. David Alan Gilbert
OK, yeh that's the same symptom I saw - it's that final failure that causes bdrv_inactivate_all to return a failure and fail the source migration. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU.

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
With the patch from Stefan and your debug applied source and target I still run into the same issue I'd say. Id's are slightly off, but they are different on every try anyway. Still looks the same for me: bdrv_inactivate_recurse: entry for drive-virtio-disk0 bdrv_inactivate_recurse: entry for

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
I didn't add Stefans patch yet. Note: the Mentioned patch is at: Note: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg04027.html With your debug patch applied I get: 2017-08-22 17:57:04.486+: initiating migration bdrv_inactivate_recurse: entry for drive-virtio-disk0

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
Building with the attached debug patch ... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10 Status in QEMU: New Status in libvirt package

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
OK, Stefan posted a patch for that assert (see 'nbd-client: avoid spurious qui_channel_yield() re-entry) so now I'm running with the following patch and I'm seeing the bdrv_inactivate return a -1 for drive-virtio-disk0 Christian: Could you see what your source says with this patch? diff --git

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
In 5/5 tries this was on qemu_fill_buffer for my case. But that was on the receiving side, and what you found is closer to the root cause on the source of the migration. I checked on qemu_file_set_error on the source and can confirm your finding that on the source it is from

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
repeated the assert in #26: Program received signal SIGABRT, Aborted. 0x7f02163005f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) where #0 0x7f02163005f7 in __GI_raise (sig=sig@entry=6)

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
The difference with the qemu_file_set_error is I'm looking on the source - because what's happening is the source is erroring so closing the socket, and so the error you're seeing on the destination is real - the socket just EOF'd! -- You received this bug notification because you are a member

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
For me qemu_file_set_error was always called from qemu_fill_buffer, interesting that it seems different for you. I'll rerun a few times to ensure it really always is always from qemu_fill_buffer for me. -- You received this bug notification because you are a member of qemu- devel-ml, which is

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
Stack from qemu_fill_buffer to qio_channel_socket_readv #0 qio_channel_socket_readv (ioc=, iov=, niov=, fds=0x0, nfds=0x0, errp=0x0) at ./io/channel-socket.c:477 #1 0x001486ec97e2 in qio_channel_read (ioc=ioc@entry=0x148a73a6c0, buf=buf@entry=\060\nLw", buflen=buflen@entry=28728,

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
Only now read comment #27, thanks David for reproducing with me, it is somewhat relieving that you seem to see the same. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title:

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
(4th try) breakpoint on qemu_file_set_error, it's bdrv_inactivate_all that's returning the error. (gdb) list 1155if (inactivate_disks) { 1156/* Inactivate before sending QEMU_VM_EOF so that the 1157 * bdrv_invalidate_cache_all() on the other end won't fail. */

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
OK, 3rd try and I've hit the same behaviour as Christian. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10 Status in QEMU: New Status in

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
Hmm i just tried to reproduce this and hit (on the source): main_channel_client_handle_migrate_connected: client 0x5607d785f610 connected: 0 seamless 0 qemu-system-x86_64: /root/qemu/io/channel.c:303: qio_channel_yield: Assertion `!ioc->write_coroutine' failed. 2017-08-22 10:50:04.888+:

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
I'll track down the actual read and then add debugging the source at the same time (that should be the best way to track the migration socket on both sides). This might be slightly tricky since I don't know exactly on which offset but I can surely start over 310*10^6 it seems. I'll report back

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
also, you might want to chase it a bit further down, I think we've got: qemu-file-channel.c:channel_get_buffer io/channel-socket.c or io/channel-file.c qio_channel_file_readv it would be good to know what the readv/readmsg is actually returning in the case where it's failing. Dave

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread Dr. David Alan Gilbert
OK, so that looks like a real case of the migration stream failing and getting an IO error; so the question is why: a) Is the source qemu dieing first and closing the socket? b) Is libvirt closing the socket for some reason -- You received this bug notification because you are a member of

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
So TL;DR summary for now: - error triggers in qio_channel_read - file is migration-socket-incoming - reads work a while, but then fail at high f->pos offsets (slightly different ones each time) - slower execution seems to lead to slightly higher offsets that are failing - only happens on

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
(gdb) handle SIGUSR1 nostop noprint pass (gdb) b migration/qemu-file.c:295 (gdb) command p f->pos c end That showed the pos is ever increasing and fails at an offset it never read before. Yet the absolute number was different. $1 = 0 $2 = 8948 $3 = 41423 [...] $11359 = 326387440 $11360 =

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
So this is failing I/O that iterates over a channel. I was tracking down the len, pending and pos used. I found that this is not completely broken (like no access or generla I/O error) It starts at pos 0 and iterated with varying offsets, but works for quite some time. Example: [...] Thread 1

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
Sure, but initially I wanted to see what is going on overall so I let it pop up. Started a debugging another session today. First I confirmed with (gdb) catch syscall exit exit_group That this is the "normal" exit along the error message we knew: migrate_set_state(>state,

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-22 Thread ChristianEhrhardt
Via a watchpoints I found that the error is set by qemu_fill_buffer. b qemu_loadvm_state handle SIGUSR1 nostop noprint pass c # on the break check and watch the status (gdb) p f $1 = (QEMUFile *) 0xb9babb3c00 (gdb) p *f $2 = {ops = 0xb9b89880a0 , hooks = 0x0, opaque = 0xb9bbabfe00, bytes_xfer =

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-21 Thread Dr. David Alan Gilbert
oh yeh you want to tell gdb to ignore SIGUSR1, something like: handle SIGUSR1 nostop noprint pass -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-21 Thread ChristianEhrhardt
As expected by David when I trace on process_incoming_migration_co which prints the "readable" error I see the error pop up on "qemu_loadvm_state" It appears as "Thread 4 "CPU 0/KVM" received signal SIGUSR1" and similar which is just the break down of the guest. Diving "into" qemu_loadvm_state

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-21 Thread ChristianEhrhardt
After this I was trying to start closer to the issue, so I put a break on "process_incoming_migration_co" (to skip over much of the initial setup). Once that was hit I added "qemu_kvm_cpu_thread_fn" and "qemu_kvm_wait_io_event". Of course when I try that the other functions do not trigger. Maybe

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-21 Thread ChristianEhrhardt
Since the qemu "lives" in that time I can try to debug what happens. With strace to sniff where things could be I see right before the end: 0.000203 recvmsg(27, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="", iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-21 Thread ChristianEhrhardt
Hi David, confirming the red-herring on the cpu feature - I had a build without runnign over the weekend so this was easy to test - and still the migration fails. I have about 7 seconds from kicking off the migration until the sync seems to pass its first phase and then qemu is exiting (at

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread Dr. David Alan Gilbert
The 'host doesn't support requested feature' is probably a red-herring in this case The fact it's failing with an IO error but nothing else suggests either: a) it's something closing the socket between the two qemu's b) The I/O error is from storage/NBD Assuming it fails on precopy, I'd look

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
** Attachment added: "libvirtd-source.log" https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934825/+files/libvirtd-source.log -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
I've seen something in the logs which I want to eliminate from the list of possibilities: "warning: host doesn't support requested feature: CPUID.8001H:ECX.svm [bit 2]" We had always a patch I questioned to enable svm capabilitiy for guests in general, it worked all the time but I'd have

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
Currently I plan to test with the svm/vmx changes disabled as well as a cross test on ppc64 and s390x which might complete the picture. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title:

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
Since this is pretty reproducible here on the setup: - Two systems (actually two lxd containers on one system) - Both running Artful with qemu 2.10-rc3 and libvirt 3.6 - Storage path is not shared but set up equivalent with a manual pre-copy - Migration with post copy is failing, no other options

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
** Attachment added: "libvirtd-target.log" https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934824/+files/libvirtd-target.log -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
To simplify downloading the logs I'm attaching here a full set of: - virsh - source libvirtd - target libvirtd ** Attachment added: "virsh-source.log" https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934823/+files/virsh-source.log -- You received this bug notification because you

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
I reached out to the people involved in the initial fixes which were related to image locking and qemu-nbd. But this might after all be something completely different. Yet until we know better it might be wiser to reach out to more people. =>

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

2017-08-18 Thread ChristianEhrhardt
The source log is virsh, I need to ensure we also have a source libvirtd log ... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10 Status in