Verification done on kinetic-proposed.

The migration status during the race condition is
not 'active' (which is expected to have RAM info, but didn't)
but 'setup' (which is not, thus no issues).

(qemu) info migrate

-updates:
...
Migration status: active
total time: 0 ms

-proposed:
...
Migration status: setup
total time: 0 ms

Detailed steps:
==============

$ lxc launch ubuntu:kinetic qemu-k
$ lxc exec qemu-k -- su - ubuntu


Packages from -updates: FAIL
----------------------

$ sudo apt install --yes --no-install-recommends qemu-system-x86 gdb
dpkg-dev

$ dpkg -s qemu-system-x86 | grep Version:
Version: 1:7.0+dfsg-7ubuntu2.1

...

Source: get line number for breakpoint.

$ sudo add-apt-repository -ys
$ apt source qemu

$ head -n1 qemu-*/debian/changelog
qemu (1:7.0+dfsg-7ubuntu2.1) kinetic-security; urgency=medium

$ vim qemu-*/migration/migration.c

1073 static void fill_source_migration_info(MigrationInfo *info)
1074 {
...
1100     case MIGRATION_STATUS_SETUP:
...
1103         break;
...

...

Terminal 1)

$ qemu-system-x86_64 -nodefaults -nographic -S -incoming tcp:0:4444

Terminal 2)

gdb \
  -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' \
  -iex 'set debuginfod enabled on' -iex 'set debuginfod urls 
https://debuginfod.ubuntu.com' \
  qemu-system-x86_64
  
(gdb) b migrate_set_state
...
Breakpoint 1 at 0x47ed10: migrate_set_state. (2 locations)
(gdb) b migration/migration.c:1103
...
Breakpoint 2 at 0x47dba0: file ../../migration/migration.c, line 1103.

(gdb) run -nodefaults -nographic -S -monitor tcp:0:3333,server,wait=off


Terminal 3)

nc 127.0.0.1 3333

(qemu) migrate -d tcp:127.0.0.1:4444

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state
(state=0x555556779618, old_state=0, new_state=1) at
../../migration/migration.c:1763

(gdb) p (MigrationStatus) 0
$1 = MIGRATION_STATUS_NONE
(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP
(gdb) c

Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state
(state=0x555556779618, old_state=1, new_state=4) at
../../migration/migration.c:1763

(gdb) p (MigrationStatus) 1
$3 = MIGRATION_STATUS_SETUP
(gdb) p (MigrationStatus) 4
$4 = MIGRATION_STATUS_ACTIVE

Terminal 3)

(qemu) info migrate

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_migration_info
(info=0x555556dc6c60) at ../../migration/migration.c:1103

(gdb) p (MigrationStatus) s.state
$6 = MIGRATION_STATUS_SETUP
(gdb) p info.status
$7 = MIGRATION_STATUS_NONE

(gdb) info threads
  Id   Target Id                                          Frame
* 1    Thread 0x7ffff6c32340 (LWP 2368) "qemu-system-x86" 
fill_source_migration_info (info=0x555556dc6c60) at 
../../migration/migration.c:1103
  2    Thread 0x7ffff65ff6c0 (LWP 2369) "qemu-system-x86" (running)
  3    Thread 0x7ffff5d7c6c0 (LWP 2370) "qemu-system-x86" (running)
  5    Thread 0x7ffff49ff6c0 (LWP 2373) "qemu-system-x86" migrate_set_state 
(state=0x555556779618, old_state=1, new_state=4) at 
../../migration/migration.c:1763

(gdb) thread 5
(gdb) continue &

(gdb) info threads
  Id   Target Id                                          Frame
  1    Thread 0x7ffff6c32340 (LWP 2368) "qemu-system-x86" 
fill_source_migration_info (info=0x555556dc6c60) at 
../../migration/migration.c:1103
  2    Thread 0x7ffff65ff6c0 (LWP 2369) "qemu-system-x86" (running)
  3    Thread 0x7ffff5d7c6c0 (LWP 2370) "qemu-system-x86" (running)
* 5    Thread 0x7ffff49ff6c0 (LWP 2373) "qemu-system-x86" (running)

(gdb) thread 1

(gdb) p (MigrationStatus) s.state
$8 = MIGRATION_STATUS_ACTIVE
(gdb) c

Terminal 3)

...
Migration status: active
total time: 0 ms
(qemu)

Migration status is active, without any RAM statistics.

(qemu) quit
(gdb) quit

Terminal 1)

Ctrl-C

...


Packages from -proposed: PASS
-----------------------

$ sudo add-apt-repository -yp proposed
$ sudo add-apt-repository -ys # didn't work for proposed
$ echo 'deb-src http://archive.ubuntu.com/ubuntu kinetic-proposed main' | sudo 
tee -a /etc/apt/sources.list

$ sudo apt install --yes --no-install-recommends qemu-system-x86

$ dpkg -s qemu-system-x86 | grep Version:
Version: 1:7.0+dfsg-7ubuntu2.2

$ rm -rf qemu-*
a$ apt source qemu

$ head -n1 qemu-*/debian/changelog
qemu (1:7.0+dfsg-7ubuntu2.2) kinetic; urgency=medium

$ vim qemu-*/migration/migration.c

1073 static void fill_source_migration_info(MigrationInfo *info)
1074 {
...
1076     int state = qatomic_read(&s->state);
...
1101     case MIGRATION_STATUS_SETUP:
...
1104         break;


Terminal 1)

$ qemu-system-x86_64 -nodefaults -nographic -S -incoming tcp:0:4444

Terminal 2)

$ gdb   -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm
off'   -iex 'set debuginfod enabled on' -iex 'set debuginfod urls
https://debuginfod.ubuntu.com'   qemu-system-x86_64

(gdb) b migrate_set_state
...
Breakpoint 1 at 0x47ed20: migrate_set_state. (2 locations)
(gdb) b migration/migration.c:1104
...
Breakpoint 2 at 0x47dbc3: file ../../migration/migration.c, line 1104.

Terminal 3)

$ nc 127.0.0.1 3333
(qemu) migrate -d tcp:127.0.0.1:4444

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state
(state=0x555556779618, old_state=0, new_state=1) at
../../migration/migration.c:1764

(gdb) p (MigrationStatus) 0
$1 = MIGRATION_STATUS_NONE
(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP
(gdb) c

Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state 
(state=0x555556779618, old_state=1, new_state=4) at 
../../migration/migration.c:1764
1764    in ../../migration/migration.c
(gdb) p (MigrationStatus) 1
$3 = MIGRATION_STATUS_SETUP
(gdb) p (MigrationStatus) 4
$4 = MIGRATION_STATUS_ACTIVE

(qemu) info migrate

Terminal 2)

Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_migration_info
(info=0x555556dc6c60) at ../../migration/migration.c:1141

(gdb) p (MigrationStatus) s.state
$6 = MIGRATION_STATUS_SETUP
(gdb) p info.status
$7 = MIGRATION_STATUS_NONE

(gdb) info threads
  Id   Target Id                                          Frame
* 1    Thread 0x7ffff6c32340 (LWP 7562) "qemu-system-x86" 
fill_source_migration_info (info=0x555556dc6c60) at 
../../migration/migration.c:1141
  2    Thread 0x7ffff65ff6c0 (LWP 7565) "qemu-system-x86" (running)
  3    Thread 0x7ffff5d7c6c0 (LWP 7566) "qemu-system-x86" (running)
  5    Thread 0x7fffa7dff6c0 (LWP 7569) "qemu-system-x86" migrate_set_state 
(state=0x555556779618, old_state=1, new_state=4) at 
../../migration/migration.c:1764

(gdb) thread 5
(gdb) continue &

(gdb) info threads
  Id   Target Id                                          Frame
  1    Thread 0x7ffff6c32340 (LWP 7562) "qemu-system-x86" 
fill_source_migration_info (info=0x555556dc6c60) at 
../../migration/migration.c:1141
  2    Thread 0x7ffff65ff6c0 (LWP 7565) "qemu-system-x86" (running)
  3    Thread 0x7ffff5d7c6c0 (LWP 7566) "qemu-system-x86" (running)
* 5    Thread 0x7fffa7dff6c0 (LWP 7569) "qemu-system-x86" (running)

(gdb) thread 1
(gdb) p (MigrationStatus) s.state
$8 = MIGRATION_STATUS_ACTIVE

(gdb) c

Terminal 3)

Status is now still 'SETUP' (which is not expected to have RAM
statistics), not 'ACTIVE' (which is, and caused the issue).

...
Migration status: setup
total time: 0 ms


** Tags removed: verification-needed-kinetic
** Tags added: verification-done-kinetic

-- 
You received this bug notification because you are a member of SE
("STS") Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1994002

Title:
  [SRU] migration was active, but no RAM info was set

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive ussuri series:
  New
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Focal:
  In Progress
Status in qemu source package in Jammy:
  In Progress
Status in qemu source package in Kinetic:
  Fix Committed

Bug description:
  [Impact]

   * While live-migrating many instances concurrently, libvirt sometimes
  return `internal error: migration was active, but no RAM info was
  set:`

   * Effects of this bug are mostly observed in large scale clusters
  with a lot of live migration activity.

   * Has second order effects for consumers of migration monitor such as
  libvirt and openstack.

  [Test Case]

  Synthetic reproducer with GDB in comment #21.

  Steps to Reproduce:
  1. live evacuate a compute
  2. live migration of one or more instances fails with the above error

  N.B Due to the nature of this bug it is difficult consistently reproduce.
  In an environment where it has been observed it is estimated to occur 
approximately 1/1000 migrations.

  [Where problems could occur]
   * In the event of a regression the migration monitor may report an 
inconsistent state.

  [Original Bug Description]

  While live-migrating many instances concurrently, libvirt sometimes return 
internal error: migration was active, but no RAM info was set:
  ~~~
  2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver 
[req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: 
af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal 
error: migration was active, but no RAM info was set: libvirt.libvirtError: 
internal error: migration was active, but no RAM info was set
  ~~~

  From upstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2074205

  [Other Information]
  Related bug: https://bugs.launchpad.net/nova/+bug/1982284

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1994002/+subscriptions


-- 
Mailing list: https://launchpad.net/~sts-sponsors
Post to     : sts-sponsors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~sts-sponsors
More help   : https://help.launchpad.net/ListHelp

Reply via email to