** Changed in: qemu
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with qemu 2.10
Status in QEMU:
Fix
This bug was fixed in the package qemu - 1:2.10~rc4+dfsg-0ubuntu1
---
qemu (1:2.10~rc4+dfsg-0ubuntu1) artful; urgency=medium
* Merge with Upstream 2.10-rc4; This fixes a migration issue (LP: #1711602);
Remaining changes:
- qemu-kvm to systemd unit
- d/qemu-kvm-init:
Yes, with all the series of [1] on top it finally works.
Saw it already being merged on master.
Expecting a late rc4 or early release tag and then wrap all it up.
Thanks everybody involved!
[1]: http://lists.nongnu.org/archive/html/qemu-
devel/2017-08/msg04513.html
** Changed in: qemu
Yeah seems to be slightly different than the former assert.
2017-08-23 18:41:54.556+: initiating migration
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block133
bdrv_inactivate_recurse: entry for #block329
bdrv_inactivate_recurse: entry for
just tested current head - 1eed33994e28d4a0437ba6e944bbc3ec5e4a29a0 -
seems to work for me.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with qemu 2.10
Looks good here, just retested:
here's teh top of my git:
f89f59fad5119f878aaedf711af90802ddcb99c7 nbd-client: avoid spurious
qio_channel_yield() re-entry
cf26039a2b50f078b4ad90b88eea5bb28971c0d8 block: Update open_flags after
->inactivate() callback
8ccc527d84ec9a5052cfae19edbc44abb5ac03ae
I need to recheck with that combo - I'd seen that error but only when
I'd commented out 'if (!blk->dev && !blk_name(blk)[0]) {' when
debugging earlier.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
That was rc3 +:
- nbd-client-avoid-spurious-qio_channel_yield.patch
- the four patches mentioned in comment #43
I could also re-base onto master + pacthes or rc4 if there is one soon.
For now building with Davids debug statements applied again to check if we
still abort around that assert.
--
Hmm,
it gets further but can still not complete this kind of migration:
$ virsh migrate --live --copy-storage-all kvmguest-artful-normal
qemu+ssh://10.22.69.30/system
Source:
2017-08-23 16:49:23.022+: initiating migration
Unexpected error in bdrv_check_perm() at
On 08/23/2017 09:55 AM, ChristianEhrhardt wrote:
> Ok, clarified with Stefanha
> It has exactly the same title as a series of 18th August which was related to
> a similar issue.
> It is about an hour old now on qemu-devel, quoting
>
> "This fixes the issue reported as
>
Hi Stefan,
I was part of the report around the series in "[PATCH for-2.10 0/4] block: Fix
non-shared storage migration", but this is happening on rc3 which contains this.
AFAIK Fam's series is:
dd7fdaad iotests: Add non-shared storage migration case 192 (Fam)
5f7772c4 block-backend: Defer
Ok, clarified with Stefanha
It has exactly the same title as a series of 18th August which was related to a
similar issue.
It is about an hour old now on qemu-devel, quoting
"This fixes the issue reported as
https://bugs.launchpad.net/bugs/1711602
Fam Zheng (3):
block-backend: Refactor
yes, seems to fix it for me.
Thanks Christian for filing this; we probably wouldn't have spotted it before
the release without it
(which the test Stefan has just added will hopefully cure!).
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed
Please see Fam's patch series "[PATCH for-2.10 0/4] block: Fix non-
shared storage migration" that fixes this issue.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all
OK, yeh that's the same symptom I saw - it's that final failure that
causes bdrv_inactivate_all to return a failure and fail the source
migration.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
With the patch from Stefan and your debug applied source and target I still run
into the same issue I'd say.
Id's are slightly off, but they are different on every try anyway.
Still looks the same for me:
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: entry for
I didn't add Stefans patch yet.
Note: the Mentioned patch is at: Note:
http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg04027.html
With your debug patch applied I get:
2017-08-22 17:57:04.486+: initiating migration
bdrv_inactivate_recurse: entry for drive-virtio-disk0
Building with the attached debug patch ...
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with qemu 2.10
Status in QEMU:
New
Status in libvirt package
OK, Stefan posted a patch for that assert (see 'nbd-client: avoid spurious
qui_channel_yield() re-entry) so now I'm running with the following patch and
I'm seeing the bdrv_inactivate return a -1 for
drive-virtio-disk0
Christian: Could you see what your source says with this patch?
diff --git
In 5/5 tries this was on qemu_fill_buffer for my case.
But that was on the receiving side, and what you found is closer to the root
cause on the source of the migration.
I checked on qemu_file_set_error on the source and can confirm your finding
that on the source it is from
repeated the assert in #26:
Program received signal SIGABRT, Aborted.
0x7f02163005f7 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) where
#0 0x7f02163005f7 in __GI_raise (sig=sig@entry=6)
The difference with the qemu_file_set_error is I'm looking on the source
- because what's happening is the source is erroring so closing the
socket, and so the error you're seeing on the destination is real - the
socket just EOF'd!
--
You received this bug notification because you are a member
For me qemu_file_set_error was always called from qemu_fill_buffer, interesting
that it seems different for you.
I'll rerun a few times to ensure it really always is always from
qemu_fill_buffer for me.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is
Stack from qemu_fill_buffer to qio_channel_socket_readv
#0 qio_channel_socket_readv (ioc=, iov=,
niov=, fds=0x0, nfds=0x0, errp=0x0)
at ./io/channel-socket.c:477
#1 0x001486ec97e2 in qio_channel_read (ioc=ioc@entry=0x148a73a6c0,
buf=buf@entry=\060\nLw", buflen=buflen@entry=28728,
Only now read comment #27, thanks David for reproducing with me, it is
somewhat relieving that you seem to see the same.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
(4th try) breakpoint on qemu_file_set_error, it's bdrv_inactivate_all
that's returning the error.
(gdb) list
1155if (inactivate_disks) {
1156/* Inactivate before sending QEMU_VM_EOF so that the
1157 * bdrv_invalidate_cache_all() on the other end won't fail. */
OK, 3rd try and I've hit the same behaviour as Christian.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with qemu 2.10
Status in QEMU:
New
Status in
Hmm i just tried to reproduce this and hit (on the source):
main_channel_client_handle_migrate_connected: client 0x5607d785f610 connected:
0 seamless 0
qemu-system-x86_64: /root/qemu/io/channel.c:303: qio_channel_yield: Assertion
`!ioc->write_coroutine' failed.
2017-08-22 10:50:04.888+:
I'll track down the actual read and then add debugging the source at the same
time (that should be the best way to track the migration socket on both sides).
This might be slightly tricky since I don't know exactly on which offset but I
can surely start over 310*10^6 it seems.
I'll report back
also, you might want to chase it a bit further down, I think we've got:
qemu-file-channel.c:channel_get_buffer
io/channel-socket.c or io/channel-file.c qio_channel_file_readv
it would be good to know what the readv/readmsg is actually
returning in the case where it's failing.
Dave
OK, so that looks like a real case of the migration stream failing and getting
an IO error; so the question is why:
a) Is the source qemu dieing first and closing the socket?
b) Is libvirt closing the socket for some reason
--
You received this bug notification because you are a member of
So TL;DR summary for now:
- error triggers in qio_channel_read
- file is migration-socket-incoming
- reads work a while, but then fail at high f->pos offsets (slightly different
ones each time)
- slower execution seems to lead to slightly higher offsets that are failing
- only happens on
(gdb) handle SIGUSR1 nostop noprint pass
(gdb) b migration/qemu-file.c:295
(gdb) command
p f->pos
c
end
That showed the pos is ever increasing and fails at an offset it never read
before. Yet the absolute number was different.
$1 = 0
$2 = 8948
$3 = 41423
[...]
$11359 = 326387440
$11360 =
So this is failing I/O that iterates over a channel.
I was tracking down the len, pending and pos used.
I found that this is not completely broken (like no access or generla I/O error)
It starts at pos 0 and iterated with varying offsets, but works for quite some
time.
Example:
[...]
Thread 1
Sure, but initially I wanted to see what is going on overall so I let it
pop up.
Started a debugging another session today.
First I confirmed with
(gdb) catch syscall exit exit_group
That this is the "normal" exit along the error message we knew:
migrate_set_state(>state,
Via a watchpoints I found that the error is set by qemu_fill_buffer.
b qemu_loadvm_state
handle SIGUSR1 nostop noprint pass
c
# on the break check and watch the status
(gdb) p f
$1 = (QEMUFile *) 0xb9babb3c00
(gdb) p *f
$2 = {ops = 0xb9b89880a0 , hooks = 0x0, opaque =
0xb9bbabfe00, bytes_xfer =
oh yeh you want to tell gdb to ignore SIGUSR1, something like:
handle SIGUSR1 nostop noprint pass
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with
As expected by David when I trace on process_incoming_migration_co which prints
the "readable" error I see the error pop up on "qemu_loadvm_state"
It appears as "Thread 4 "CPU 0/KVM" received signal SIGUSR1" and similar which
is just the break down of the guest.
Diving "into" qemu_loadvm_state
After this I was trying to start closer to the issue, so I put a break on
"process_incoming_migration_co" (to skip over much of the initial setup).
Once that was hit I added "qemu_kvm_cpu_thread_fn" and "qemu_kvm_wait_io_event".
Of course when I try that the other functions do not trigger.
Maybe
Since the qemu "lives" in that time I can try to debug what happens.
With strace to sniff where things could be I see right before the end:
0.000203 recvmsg(27, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="",
iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},
Hi David,
confirming the red-herring on the cpu feature - I had a build without runnign
over the weekend so this was easy to test - and still the migration fails.
I have about 7 seconds from kicking off the migration until the sync seems to
pass its first phase and then qemu is exiting (at
The 'host doesn't support requested feature' is probably a red-herring in this
case
The fact it's failing with an IO error but nothing else suggests either:
a) it's something closing the socket between the two qemu's
b) The I/O error is from storage/NBD
Assuming it fails on precopy, I'd look
** Attachment added: "libvirtd-source.log"
https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934825/+files/libvirtd-source.log
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
I've seen something in the logs which I want to eliminate from the list of
possibilities:
"warning: host doesn't support requested feature: CPUID.8001H:ECX.svm
[bit 2]"
We had always a patch I questioned to enable svm capabilitiy for guests in
general, it worked all the time but I'd have
Currently I plan to test with the svm/vmx changes disabled as well as a
cross test on ppc64 and s390x which might complete the picture.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
Since this is pretty reproducible here on the setup:
- Two systems (actually two lxd containers on one system)
- Both running Artful with qemu 2.10-rc3 and libvirt 3.6
- Storage path is not shared but set up equivalent with a manual pre-copy
- Migration with post copy is failing, no other options
** Attachment added: "libvirtd-target.log"
https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934824/+files/libvirtd-target.log
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
To simplify downloading the logs I'm attaching here a full set of:
- virsh
- source libvirtd
- target libvirtd
** Attachment added: "virsh-source.log"
https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934823/+files/virsh-source.log
--
You received this bug notification because you
I reached out to the people involved in the initial fixes which were related to
image locking and qemu-nbd. But this might after all be something completely
different.
Yet until we know better it might be wiser to reach out to more people.
=>
The source log is virsh, I need to ensure we also have a source libvirtd
log ...
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602
Title:
--copy-storage-all failing with qemu 2.10
Status in
50 matches
Mail list logo