Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-20 Thread Jue Wang
On Tue, Apr 20, 2021 at 8:48 AM Luck, Tony wrote: > > On Mon, Apr 19, 2021 at 07:03:01PM -0700, Jue Wang wrote: > > On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: > > > > > This patch suggests to do page table walk to find the error virtual > > > a

Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-19 Thread Jue Wang
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: > This patch suggests to do page table walk to find the error virtual > address. If we find multiple virtual addresses in walking, we now can't > determine which one is correct, so we fall back to sending SIGBUS in > kill_me_maybe()

Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-19 Thread Jue Wang
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: ... > + * This function is intended to handle "Action Required" MCEs on already > + * hardware poisoned pages. They could happen, for example, when > + * memory_failure() failed to unmap the error page at the first call, or > + * when

Re: [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery

2021-04-19 Thread Jue Wang
On Thu, 25 Mar 2021 17:02:35 -0700, Tony Luck wrote: ... > But there are places in the kernel where the code assumes that this > EFAULT return was simply because of a page fault. The code takes some > action to fix that, and then retries the access. This results in a second > machine check. What

Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

2021-04-19 Thread Jue Wang
On Thu, 8 Apr 2021 10:08:52 -0700, Tony Luck wrote: > KVM apparently passes a machine check into the guest. Though it seems > to be misisng the MCG_STATUS information to tell the guest whether this > is an "Action Required" machine check, or an "Action Optional" (i.e. > whether the poison was

Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

2021-04-14 Thread Jue Wang
On Wed, Apr 14, 2021 at 6:10 AM Borislav Petkov wrote: > > On Tue, Apr 13, 2021 at 10:47:21PM -0700, Jue Wang wrote: > > This path is when EPT #PF finds accesses to a hwpoisoned page and > > sends SIGBUS to user space (KVM exits into user space) with the same > > semantic

Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

2021-04-13 Thread Jue Wang
On Tue, 13 Apr 2021 12:07:22 +0200, Petkov, Borislav wrote: >> KVM apparently passes a machine check into the guest. > Ah, there it is: > static void kvm_send_hwpoison_signal(unsigned long address, struct > task_struct *tsk) > { > send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address,

Re: [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-12 Thread Jue Wang
I believe the mutex type patch has its own value in protecting memory_failure from other inherent races, e.g., races around split_huge_page where concurrent MCE happens to different 4k pages under the same THP.[1] This realistically can happen given the physical locality clustering effect of

Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp

2021-03-11 Thread Jue Wang
plit_huge_page() is called, then passed in to hwpoison_user_mappings(). > > > > Sorry, we don't have a proper patch for that right now, but I expect > > you can see what needs to be done. But something we found on the way, > > we do have a patch for: add_to_kill() uses page_ad