Since v3, * Attacked the replay_linux.py bugs and found a bunch of gaps in networking that was causing the hangs. * And several powerpc bugs that were also causing problems on pseries. * Added ppc test to replay_linux.py now that it's working. * Found several crash bugs in record/replay vs migration. * Added snapshot and more stepping tests to reverse_debugging.py * Addressed comments in auto-snapshot code. * Added auto-snapshot test case. * "Solved" x86-64 issues in test cases by switching to q35, which seems to have less problems.
The last 3 patches I will take in the ppc tree, but included here because powerpc is the only one that survives the record-replay test with auto-snapshots at the moment. Thanks, Nick Since v2, here fixes became less minor so I rename the series. https://lore.kernel.org/qemu-devel/20240125160835.480488-1-npig...@gmail.com/#r) * Found several more bugs (patches 5-8). * Enable the rr avocado test on pseries and aarch64 virt since they're passing here (and on gitlab, e.g., https://gitlab.com/npiggin/qemu/-/jobs/6253787216, https://gitlab.com/npiggin/qemu/-/jobs/6253787218). * Updated replay-dump script to John's feedback. x86-64 still has issues with replay and reverse debugging tests. replay_kernel.py seems to be timing dependent -- after patch 5 I had it pass 30/30 runs, then the following day 0/30 and I realized I had several other QEMU instances hogging the CPU which probably changed timings. So the first thing I would look at is timers and clocks. pseries had some rounding issues in time calculations that meant clock/timer were not replayed exactly as they were recorded, which caused hangs. Thanks, Nick Nicholas Piggin (24): scripts/replay-dump.py: Update to current rr record format scripts/replay-dump.py: rejig decoders in event number order tests/avocado: excercise scripts/replay-dump.py in replay tests replay: allow runstate shutdown->running when replaying trace Revert "replay: stop us hanging in rr_wait_io_event" chardev: set record/replay on the base device of a muxed device replay: Fix migration use of clock replay: Fix migration replay_mutex locking virtio-net: Use replay_schedule_bh_event for bhs that affect machine state virtio-net: Use virtual time for RSC timers net: Use virtual time for net announce savevm: Fix load_snapshot error path crash tests/avocado: replay_linux.py remove the timeout expected guards tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky tests/avocado: reverse_debugging.py add test for x86-64 q35 machine tests/avocado: reverse_debugging.py verify addresses between record and replay tests/avocado: reverse_debugging.py stop VM before sampling icount tests/avocado: reverse_debugging reverse-step at the end of the trace tests/avocado: reverse_debugging.py add snapshot testing replay: simple auto-snapshot mode for record tests/avocado: reverse_debugging.py test auto-snapshot mode target/ppc: fix timebase register reset state spapr: Fix vpa dispatch count for record-replay tests/avocado: replay_linux.py add ppc64 pseries test docs/system/replay.rst | 5 + include/hw/ppc/spapr_cpu_core.h | 3 + include/sysemu/replay.h | 16 ++- include/sysemu/runstate.h | 1 + accel/tcg/tcg-accel-ops-rr.c | 2 +- chardev/char.c | 71 ++++++++---- hw/net/virtio-net.c | 17 +-- hw/ppc/ppc.c | 11 +- hw/ppc/spapr.c | 36 +----- hw/ppc/spapr_hcall.c | 33 ++++++ hw/ppc/spapr_rtas.c | 1 + migration/migration.c | 17 ++- migration/savevm.c | 1 + net/announce.c | 2 +- replay/replay-snapshot.c | 57 ++++++++++ replay/replay.c | 50 ++++---- system/runstate.c | 31 ++++- system/vl.c | 9 ++ target/ppc/machine.c | 4 + qemu-options.hx | 9 +- scripts/replay-dump.py | 167 ++++++++++++++++++--------- tests/avocado/replay_kernel.py | 11 ++ tests/avocado/replay_linux.py | 97 +++++++++++++++- tests/avocado/reverse_debugging.py | 176 ++++++++++++++++++++++++----- 24 files changed, 635 insertions(+), 192 deletions(-) -- 2.42.0