Re: [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size

2024-10-29 Thread William Roche
On 10/28/24 17:42, David Hildenbrand wrote: On 26.10.24 01:27, William Roche wrote: On 10/23/24 09:28, David Hildenbrand wrote: On 22.10.24 23:35, “William Roche wrote: From: William Roche Add the page size information to the hwpoison_page_list elements. As the kernel doesn't always r

Re: [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages

2024-10-29 Thread William Roche
On 10/28/24 18:01, David Hildenbrand wrote: On 26.10.24 01:27, William Roche wrote: On 10/23/24 09:30, David Hildenbrand wrote: On 22.10.24 23:35, “William Roche wrote: From: William Roche When the VM reboots, a memory reset is performed calling qemu_ram_remap() on all hwpoisoned pages

Re: [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size

2024-10-25 Thread William Roche
On 10/23/24 09:28, David Hildenbrand wrote: On 22.10.24 23:35, “William Roche wrote: From: William Roche Add the page size information to the hwpoison_page_list elements. As the kernel doesn't always report the actual poisoned page size, we adjust this size from the backend real page siz

Re: [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages

2024-10-25 Thread William Roche
On 10/23/24 09:30, David Hildenbrand wrote: On 22.10.24 23:35, “William Roche wrote: From: William Roche When the VM reboots, a memory reset is performed calling qemu_ram_remap() on all hwpoisoned pages. While we take into account the recorded page sizes to repair the memory locations, a

Re: [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size

2024-10-25 Thread William Roche
On 10/23/24 09:28, David Hildenbrand wrote: On 22.10.24 23:35, “William Roche wrote: From: William Roche Add the page size information to the hwpoison_page_list elements. As the kernel doesn't always report the actual poisoned page size, we adjust this size from the backend real page

[PATCH v1 4/4] accel/kvm: Report the loss of a large memory page

2024-10-22 Thread William Roche
From: William Roche On HW memory error, we need to report better what the impact of this error is. So when an entire large page is impacted by an error (like the hugetlbfs case), we give a warning message when this page is first hit: Memory error: Loosing a large page (size: X) at QEMU addr Y

[PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages

2024-10-22 Thread William Roche
From: William Roche When the VM reboots, a memory reset is performed calling qemu_ram_remap() on all hwpoisoned pages. While we take into account the recorded page sizes to repair the memory locations, a large page also needs to punch a hole in the backend file to regenerate a usable memory

[PATCH v1 0/4] hugetlbfs memory HW error fixes

2024-10-22 Thread William Roche
From: William Roche This set of patches fixes several problems with hardware memory errors impacting hugetlbfs memory backed VMs. When using hugetlbfs large pages, any large page location being impacted by an HW memory error results in poisoning the entire page, suddenly making a large chunk of

[PATCH v1 1/4] accel/kvm: SIGBUS handler should also deal with si_addr_lsb

2024-10-22 Thread William Roche
From: William Roche The SIGBUS signal siginfo reporting a HW memory error provides a si_addr_lsb field with an indication of the impacted memory page size. This information should be used to track the hwpoisoned page sizes. Signed-off-by: William Roche --- accel/kvm/kvm-all.c| 6

[PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size

2024-10-22 Thread William Roche
From: William Roche Add the page size information to the hwpoison_page_list elements. As the kernel doesn't always report the actual poisoned page size, we adjust this size from the backend real page size. We take into account the recorded page size to adjust the size and location of the m

Re: [RFC RESEND 0/6] hugetlbfs largepage RAS project

2024-10-10 Thread William Roche
On 10/9/24 17:45, Peter Xu wrote: On Thu, Sep 19, 2024 at 06:52:37PM +0200, William Roche wrote: Hello David, I hope my last week email answered your interrogations about:     - retrieving the valid data from the lost hugepage     - the need of smaller pages to replace a failed large page

Re: [RFC RESEND 0/6] hugetlbfs largepage RAS project

2024-09-19 Thread William Roche
Hello David, I hope my last week email answered your interrogations about:     - retrieving the valid data from the lost hugepage     - the need of smaller pages to replace a failed large page     - the interaction of memory error and VM migration     - the non-symmetrical access to a poisoned me

Re: [RFC RESEND 0/6] hugetlbfs largepage RAS project

2024-09-12 Thread William Roche
On 9/12/24 00:07, David Hildenbrand wrote: Hi again, This is a Qemu RFC to introduce the possibility to deal with hardware memory errors impacting hugetlbfs memory backed VMs. When using hugetlbfs large pages, any large page location being impacted by an HW memory error results in poisoning th

Re: [RFC RESEND 0/6] hugetlbfs largepage RAS project

2024-09-10 Thread William Roche
On 9/10/24 13:36, David Hildenbrand wrote: On 10.09.24 12:02, “William Roche wrote: From: William Roche Hi, Apologies for the noise; resending as I missed CC'ing the maintainers of the changed files Hello, This is a Qemu RFC to introduce the possibility to deal with hardware m

[RFC RESEND 0/6] hugetlbfs largepage RAS project

2024-09-10 Thread William Roche
From: William Roche Apologies for the noise; resending as I missed CC'ing the maintainers of the changed files Hello, This is a Qemu RFC to introduce the possibility to deal with hardware memory errors impacting hugetlbfs memory backed VMs. When using hugetlbfs large pages, any large

[RFC RESEND 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb

2024-09-10 Thread William Roche
From: William Roche The SIGBUS signal siginfo reporting a HW memory error provides a si_addr_lsb fields with an indication of the impacted memory page size. This information should be used to track the hwpoisoned page sizes. Signed-off-by: William Roche --- accel/kvm/kvm-all.c| 6

[RFC RESEND 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume

2024-09-10 Thread William Roche
From: William Roche In case the SIGBUS handler is triggered by a BUS_MCEERR_AO signal and this handler needs to exit to let the VM pause during the memory mapping change, this SIGBUS won't be regenerated when the VM resumes. In this case we take note of this signal before exiting the handl

[RFC RESEND 2/6] accel/kvm: Keep track of the HWPoisonPage sizes

2024-09-10 Thread William Roche
From: William Roche Add the page size information to the hwpoison_page_list elements. Signed-off-by: William Roche --- accel/kvm/kvm-all.c | 11 +++ include/sysemu/kvm.h | 3 ++- include/sysemu/kvm_int.h | 3 ++- target/arm/kvm.c | 5 +++-- target/i386/kvm/kvm.c

[RFC RESEND 3/6] system/physmem: Remap memory pages on reset based on the page size

2024-09-10 Thread William Roche
From: William Roche When the VM reboots, a memory reset is performed calling qemu_ram_remap() on all hwpoisoned pages. We take into account the recorded page size to adjust the size and location of the memory hole. In case of a largepage used, we also need to punch a hole in the backend file to

[RFC RESEND 4/6] system: Introducing hugetlbfs largepage RAS feature

2024-09-10 Thread William Roche
From: William Roche We need to deal with hugetlbfs memory large pages facing HW errors, to increase the probability to survive a memory poisoning. When an error is detected, the platform kernel marks the entire hugetlbfs large page as "poisoned" and reports the event to all potential u

[RFC RESEND 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener

2024-09-10 Thread William Roche
From: William Roche madvise MADV_HWPOISON can generate a SIGBUS when called, so the listener thread (the caller) needs to deal with this signal. The signal handler recognizes a thread specific variable allowing it to directly exit when generated from this thread. Signed-off-by: William Roche

[RFC 2/6] accel/kvm: Keep track of the HWPoisonPage sizes

2024-09-10 Thread William Roche
From: William Roche Add the page size information to the hwpoison_page_list elements. Signed-off-by: William Roche --- accel/kvm/kvm-all.c | 11 +++ include/sysemu/kvm.h | 3 ++- include/sysemu/kvm_int.h | 3 ++- target/arm/kvm.c | 5 +++-- target/i386/kvm/kvm.c

[RFC 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume

2024-09-10 Thread William Roche
From: William Roche In case the SIGBUS handler is triggered by a BUS_MCEERR_AO signal and this handler needs to exit to let the VM pause during the memory mapping change, this SIGBUS won't be regenerated when the VM resumes. In this case we take note of this signal before exiting the handl

[RFC 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener

2024-09-10 Thread William Roche
From: William Roche madvise MADV_HWPOISON can generate a SIGBUS when called, so the listener thread (the caller) needs to deal with this signal. The signal handler recognizes a thread specific variable allowing it to directly exit when generated from this thread. Signed-off-by: William Roche

[RFC 0/6] hugetlbfs largepage RAS project

2024-09-10 Thread William Roche
From: William Roche Hello, This is a Qemu RFC to introduce the possibility to deal with hardware memory errors impacting hugetlbfs memory backed VMs. When using hugetlbfs large pages, any large page location being impacted by an HW memory error results in poisoning the entire page, suddenly

[RFC 4/6] system: Introducing hugetlbfs largepage RAS feature

2024-09-10 Thread William Roche
From: William Roche We need to deal with hugetlbfs memory large pages facing HW errors, to increase the probability to survive a memory poisoning. When an error is detected, the platform kernel marks the entire hugetlbfs large page as "poisoned" and reports the event to all potential u

[RFC 3/6] system/physmem: Remap memory pages on reset based on the page size

2024-09-10 Thread William Roche
From: William Roche When the VM reboots, a memory reset is performed calling qemu_ram_remap() on all hwpoisoned pages. We take into account the recorded page size to adjust the size and location of the memory hole. In case of a largepage used, we also need to punch a hole in the backend file to

[RFC 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb

2024-09-10 Thread William Roche
From: William Roche The SIGBUS signal siginfo reporting a HW memory error provides a si_addr_lsb fields with an indication of the impacted memory page size. This information should be used to track the hwpoisoned page sizes. Signed-off-by: William Roche --- accel/kvm/kvm-all.c| 6

[PATCH v1 0/1] Qemu crashes on VM migration after an handled memory error

2024-01-30 Thread William Roche
From: William Roche Problem: A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space (an unreadable page or set of pages). A

[PATCH v1 1/1] migration: prevent migration when VM has poisoned memory

2024-01-30 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0

Re: [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM

2023-11-10 Thread William Roche
On 11/8/23 22:45, Peter Xu wrote: On Mon, Nov 06, 2023 at 10:38:14PM +0100, William Roche wrote: But it implies a lot of other changes:     - The source has to flag the error pages to indicate a poison   (new flag in the exchange protocole)     - The destination has to be able to deal

[PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase

2023-11-06 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. Thus, it is now treated as a zero-page for the ram saving migration phase. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a

[PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM

2023-11-06 Thread William Roche
From: William Roche Migrating a poisoned page as a zero-page can only be done when the running guest kernel knows about this poison, so that it marks this page as inaccessible and any access in the VM would fail. But if a poison information is not relayed to the VM, the kernel does not prevent

[PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error

2023-11-06 Thread William Roche
From: William Roche Note about ARM specificities: This code has a small part impacting more specificaly ARM machines, that's the reason why I added qemu-...@nongnu.org -- see description. A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also

Re: [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM

2023-11-06 Thread William Roche
On 10/17/23 17:13, Peter Xu wrote: On Tue, Oct 17, 2023 at 02:38:48AM +0200, William Roche wrote: On 10/16/23 18:48, Peter Xu wrote: On Fri, Oct 13, 2023 at 03:08:39PM +, “William Roche wrote: diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index 5e95c496bb..e8db6380c1 100644 --- a

Re: [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM

2023-10-16 Thread William Roche
On 10/16/23 18:48, Peter Xu wrote: On Fri, Oct 13, 2023 at 03:08:39PM +, “William Roche wrote: diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index 5e95c496bb..e8db6380c1 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -1158,7 +1158,6 @@ void kvm_arch_on_sigbus_vcpu

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-10-13 Thread William Roche
o the kvm_hwpoison_page_add function in kvm_arch_on_sigbus_vcpu with: kvm_hwpoison_page_add(ram_addr, (code == BUS_MCEERR_AR)); Of course we'll have to wait for this above patch to be integrated first. HTH, William. On 9/19/23 00:00, William Roche wrote: > Hi John, > >

[PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM

2023-10-13 Thread William Roche
From: William Roche Migrating a poisoned page as a zero-page can only be done when the running guest kernel knows about this poison, so that it marks this page as unaccessible and any access in the VM would fail. But if a poison information is not relayed to the VM, the kernel does not prevent

[PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error

2023-10-13 Thread William Roche
From: William Roche A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space that the VM kernel knows about (an unreadable page or set

[PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase

2023-10-13 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. Thus, it is now treated as a zero-page for the ram saving migration phase. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-22 Thread William Roche
On 9/22/23 16:30, Yazen Ghannam wrote: On 9/22/23 4:36 AM, William Roche wrote: On 9/21/23 19:41, Yazen Ghannam wrote: [...] Also, during page migration, does the data flow through the CPU core? Sorry for the basic question. I haven't done a lot with virtualization. Yes, in most cases

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-22 Thread William Roche
On 9/21/23 19:41, Yazen Ghannam wrote: On 9/20/23 7:13 AM, Joao Martins wrote: On 18/09/2023 23:00, William Roche wrote: [...] So it looks like the mechanism works fine... unless the VM has migrated between the SRAO error and the first time it really touches the poisoned page to get an SRAR

[PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-20 Thread William Roche
From: William Roche A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space that the VM kernel knows about (an unreadable page or set

[PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-20 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. Thus, it is now treated as a zero-page for the ram saving migration phase. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-20 Thread William Roche
Thank you Zhijian for your feedback. So I'll try to push this change today. Cheers, William. On 9/20/23 12:04, Zhijian Li (Fujitsu) wrote: On 15/09/2023 19:31, William Roche wrote: On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: I'm okay with "RDMA isn't touched&qu

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-18 Thread William Roche
Hi John, I'd like to put the emphasis on the fact that ignoring the SRAO error for a VM is a real problem at least for a specific (rare) case I'm currently working on: The VM migration. Context: - In the case of a poisoned page in the VM address space, the migration can't read it and will skip

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-15 Thread William Roche
On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: I'm okay with "RDMA isn't touched". BTW, could you share your reproducing program/hacking to poison the page, so that i am able to take a look the RDMA part later when i'm free. Not sure it's suitable to acknowledge a not touched part. Anyway Acke

[PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-14 Thread William Roche
From: William Roche A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space that the VM kernel knows about (an unreadable page or set

[PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-14 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. Thus, it is now treated as a zero-page for the ram saving migration phase. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a

Re: [PATCH v3 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-07 Thread William Roche
On 9/7/23 13:12, Gupta, Pankaj wrote: diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 5fce74aac5..4d42d3ed4c 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)   mcg_

Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-06 Thread William Roche
On 9/6/23 17:16, Peter Xu wrote: Just a note.. Probably fine for now to reuse block page size, but IIUC the right thing to do is to fetch it from the signal info (in QEMU's sigbus_handler()) of kernel_siginfo.si_addr_lsb. At least for x86 I think that stores the "shift" of covered poisoned pag

[PATCH 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-06 Thread William Roche
From: William Roche A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space that the VM kernel knows about (an unreadable page or set

[PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-06 Thread William Roche
From: William Roche A memory page poisoned from the hypervisor level is no longer readable. Thus, it is now treated as a zero-page for the ram saving migration phase. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a

Re: [PATCH v2 2/2] i386: Fix MCE support for AMD hosts

2023-08-31 Thread William Roche
platforms. Reported-by: William Roche Signed-off-by: John Allen --- target/i386/helper.c | 4 target/i386/kvm/kvm.c | 17 +++-- 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/target/i386/helper.c b/target/i386/helper.c index 533b29cb91..a6523858e0 100644

Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts

2023-08-31 Thread William Roche
d suggest to add a 3rd patch implementing this AMD specific filter: commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06 Author: William Roche Date:   Thu Aug 31 18:54:57 2023 +     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest     AMD guests can't currently deal with