Re: [PATCH v5 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-26 Thread Jane Chu
On 6/24/2023 11:25 PM, Markus Elfring wrote: Change from v4: … I suggest to omit the cover letter for a single patch. Will any patch series evolve for your proposed changes? No. The thought was to put descriptions unsuitable for commit header in the cover letter. thanks, jane Regards,

[PATCH v5 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-15 Thread Jane Chu
to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency. Signed-off-by: Jane Chu --- drivers/dax/super.c | 5 - drivers

[PATCH v5 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-15 Thread Jane Chu
from leaking out to block read(2). Suggested by Matthew. Jane Chu (1): dax: enable dax fault handler to report VM_FAULT_HWPOISON drivers/dax/super.c | 5 - drivers/nvdimm/pmem.c| 2 +- drivers/s390/block/dcssblk.c | 3 ++- fs/dax.c | 11

Re: [PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-14 Thread Jane Chu
On 6/8/2023 8:16 PM, Dan Williams wrote: [..] +static inline int dax_mem2blk_err(int err) +{ + return (err == -EHWPOISON) ? -EIO : err; +} I think it is worth a comment on this function to indicate where this helper is *not* used. I.e. it's easy to grep for where the error code is

Re: [PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-30 Thread Jane Chu
Ping... Is there any further concern? -jane On 5/8/2023 10:47 PM, Jane Chu wrote: When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for the poison

[PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency. Signed-off-by: Jane Chu --- drivers/dax/super.c | 5 - drivers

[PATCH v4 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
Change from v3: Prevent leaking EHWPOISON to user level block IO calls such as zero_range_range, and truncate. Suggested by Dan. Change from v2: Convert EHWPOISON to EIO to prevent EHWPOISON errno from leaking out to block read(2). Suggested by Matthew. Jane Chu (1): dax: enable dax fault

Re: [PATCH v3] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
On 5/4/2023 7:32 PM, Dan Williams wrote: Jane Chu wrote: When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for the poison scope. Soon after, any other

[PATCH v3] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-04 Thread Jane Chu
to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. Change from v2: Convert -EHWPOISON to -EIO to prevent EHWPOISON errno from leaking out to block read(2) - suggested by Matthew. Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 2 +- fs/dax.c

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
On 4/27/2023 4:48 PM, Matthew Wilcox wrote: On Thu, Apr 27, 2023 at 04:36:58PM -0700, Jane Chu wrote: This change results in EHWPOISON leaking to usersapce in the case of read(2), that's not a return code that block I/O applications have ever had to contend with before. Just as badblocks cause

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
Hi, Dan, On 4/27/2023 2:36 PM, Dan Williams wrote: Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwpoison

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
Hi, Dan, On 4/27/2023 2:36 PM, Dan Williams wrote: Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwpoison

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-18 Thread Jane Chu
Ping, any comment? thanks, -jane On 4/6/2023 4:01 PM, Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwpoison

[PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 2 +- fs/dax.c | 2 +- include/linux/mm.h| 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ceea55f621cc..46e094e56159 100644 --- a/drivers/nvdimm/pmem.c +++ b

Re: [PATCH] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
On 4/6/2023 12:32 PM, Matthew Wilcox wrote: On Thu, Apr 06, 2023 at 11:55:56AM -0600, Jane Chu wrote: static vm_fault_t dax_fault_return(int error) { if (error == 0) return VM_FAULT_NOPAGE; - return vmf_error(error); + else if (error == -ENOMEM

[PATCH] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 2 +- fs/dax.c | 14 -- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ceea55f621cc..46e094e56159 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c

[PATCH v8] x86/mce: retrieve poison range from hardware

2022-08-26 Thread Jane Chu
rg/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Reviewed-by: Ingo Molnar Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/c

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-26 Thread Jane Chu
On 8/26/2022 11:09 AM, Borislav Petkov wrote: > On Fri, Aug 26, 2022 at 10:54:31AM -0700, Dan Williams wrote: >> How about: >> >> --- >> >> When memory poison consumption machine checks fire, >> mce-notifier-handlers like nfit_handle_mce() record the impacted >> physical address range. > > ...

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-25 Thread Jane Chu
On 8/23/2022 9:51 AM, Borislav Petkov wrote: > On Tue, Aug 02, 2022 at 01:50:53PM -0600, Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report &

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-23 Thread Jane Chu
PEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics. + Huang, Ying. x86 maintainers, Please let me know if you need another revision. thanks, -jane On 8/8/2022 4:30 PM, Dan Williams wr

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-08 Thread Jane Chu
On 8/3/2022 1:53 AM, Ingo Molnar wrote: > > * Jane Chu wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > > s/Commit/commit Maintainers, Would you prefer a v8, or take care the comment upon accepting the patch? > >

[PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-02 Thread Jane Chu
hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Reviewed-by: Ingo Molnar Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) dif

Re: [PATCH v6] x86/mce: retrieve poison range from hardware

2022-08-02 Thread Jane Chu
On 8/2/2022 3:59 AM, Ingo Molnar wrote: > > * Jane Chu wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL &l

[PATCH v6] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 2:20 PM, Dan Williams wrote: > Jane Chu wrote: >> On 8/1/2022 9:44 AM, Dan Williams wrote: >>> Jane Chu wrote: >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>> poison granularity") that changed nfit_

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 9:44 AM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 8:58 AM, Luck, Tony wrote: >> struct mce m; >> +int lsb = PAGE_SHIFT; > > Some maintainers like to order local declaration lines from longest to > shortest > >> + /* >> + * Even if the ->validation_bits are set for address mask, >> + * to be extra safe,

[PATCH v5] x86/mce: retrieve poison range from hardware

2022-07-30 Thread Jane Chu
hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-28 Thread Jane Chu
On 7/28/2022 11:46 AM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 1:01 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> On 7/27/2022 12:30 PM, Jane Chu wrote: >>>>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>>>> On

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-28 Thread Jane Chu
On 7/27/2022 1:01 PM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 12:30 PM, Jane Chu wrote: >>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>> On 7/27/2022 11:56 AM, Dan Williams wrote: >>>>> Jane Chu wrote: >>>>>> With

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 12:30 PM, Jane Chu wrote: > On 7/27/2022 12:24 PM, Jane Chu wrote: >> On 7/27/2022 11:56 AM, Dan Williams wrote: >>> Jane Chu wrote: >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>> poison granularity&q

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 12:24 PM, Jane Chu wrote: > On 7/27/2022 11:56 AM, Dan Williams wrote: >> Jane Chu wrote: >>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> poison granularity") that changed nfit_handle_mce() callback

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 11:56 AM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR

[PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/

Re: [PATCH v3] x86/mce: retrieve poison range from hardware

2022-07-18 Thread Jane Chu
On 7/18/2022 12:22 PM, Luck, Tony wrote: >> It appears the kernel is trusting that ->physical_addr_mask is non-zero >> in other paths. So this is at least equally broken in the presence of a >> broken BIOS. The impact is potentially larger though with this change, >> so it might be a good

[PATCH v3] x86/mce: retrieve poison range from hardware

2022-07-17 Thread Jane Chu
hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c

Re: [PATCH v2] x86/mce: retrieve poison range from hardware whenever supported

2022-07-16 Thread Jane Chu
On 7/15/2022 9:50 PM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR

[PATCH v2] x86/mce: retrieve poison range from hardware whenever supported

2022-07-15 Thread Jane Chu
hardware whenever support is available. v1: https://lkml.org/lkml/2022/7/15/1040 Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..a4d589363

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/15/2022 12:17 PM, Dan Williams wrote: > [ add Tony ] > > Jane Chu wrote: >> On 7/14/2022 6:19 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> I meant to say there would be 8 calls to the nfit_handle_mce() callback, >>>> one call for each poi

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/14/2022 5:58 PM, Dan Williams wrote: [..] >>> > However, the ARS engine likely can return the precise error ranges so I > think the fix is to just use the address range indicated by 1UL << > MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS > scrub request to

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/14/2022 6:19 PM, Dan Williams wrote: > Jane Chu wrote: >> I meant to say there would be 8 calls to the nfit_handle_mce() callback, >> one call for each poison with accurate address. >> >> Also, short ARS would find 2 poisons. >> >> I attached the co

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-14 Thread Jane Chu
On 7/13/2022 5:24 PM, Dan Williams wrote: > Jane Chu wrote: >> On 7/12/2022 5:48 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison >>>> granularity") changed nfit_handle

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-13 Thread Jane Chu
On 7/12/2022 5:48 PM, Dan Williams wrote: > Jane Chu wrote: >> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison >> granularity") changed nfit_handle_mce() callback to report badrange for >> each poison at an alignment indicated by 1

[PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-11 Thread Jane Chu
t happens to be the badblock granularity, b. ndctl inject-error cannot inject more than one poison to a 512-byte block, c. architecture agnostic Fixes: 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison granularity") Signed-off-by: Jane Chu --- drivers/acpi/nfit/mce.c | 4

Re: [PATCH v2] pmem: fix a name collision

2022-06-30 Thread Jane Chu
On 6/30/2022 11:29 AM, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig Thank you! -jane

[PATCH v2] pmem: fix a name collision

2022-06-30 Thread Jane Chu
ffset; 51 } 52 Fixes: 9409c9b6709e (pmem: refactor pmem_clear_poison()) Reported-by: kernel test robot Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 629

Re: [PATCH] pmem: fix a name collision

2022-06-30 Thread Jane Chu
On 6/30/2022 11:04 AM, Christoph Hellwig wrote: > On Thu, Jun 30, 2022 at 11:51:55AM -0600, Jane Chu wrote: >> -static phys_addr_t to_phys(struct pmem_device *pmem, phys_addr_t offset) >> +static phys_addr_t _to_phys(struct pmem_device *pmem, phys_addr_t offset) >

[PATCH] pmem: fix a name collision

2022-06-30 Thread Jane Chu
ffset; 51 } 52 Fixes: 9409c9b6709e (pmem: refactor pmem_clear_poison()) Reported-by: kernel test robot Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 629d10fcf5

Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-04-05 Thread Jane Chu
On 3/30/2022 9:18 AM, Darrick J. Wong wrote: > On Wed, Mar 30, 2022 at 08:49:29AM -0700, Christoph Hellwig wrote: >> On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote: >>> As the code I pasted before, pmem driver will subtract its ->data_offset, >>> which is byte-based. And the

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
nel.org > Subject: Re: Phantom PMEM poison issue > > On Sat, Jan 22, 2022 at 12:40:18AM +, Jane Chu wrote: >> On 1/21/2022 4:31 PM, Jane Chu wrote: >>> On baremetal Intel platform with DCPMEM installed and configured to >>> provision daxfs, say a poison

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On 1/21/2022 5:27 PM, Luck, Tony wrote: > On Sat, Jan 22, 2022 at 12:40:18AM +0000, Jane Chu wrote: >> On 1/21/2022 4:31 PM, Jane Chu wrote: >>> On baremetal Intel platform with DCPMEM installed and configured to >>> provision daxfs, say a poison was consumed b

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On 1/21/2022 4:31 PM, Jane Chu wrote: > On baremetal Intel platform with DCPMEM installed and configured to > provision daxfs, say a poison was consumed by a load from a user thread, > and then daxfs takes action and clears the poison, confirmed by "ndctl > -NM". > &

Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On baremetal Intel platform with DCPMEM installed and configured to provision daxfs, say a poison was consumed by a load from a user thread, and then daxfs takes action and clears the poison, confirmed by "ndctl -NM". Now, depends on the luck, after sometime(from a few seconds to 5+ hours)

Re: [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation

2021-11-04 Thread Jane Chu
On 11/4/2021 10:55 AM, Christoph Hellwig wrote: > On Tue, Sep 14, 2021 at 05:31:32PM -0600, Jane Chu wrote: >> +static int pmem_dax_clear_poison(struct dax_device *dax_dev, pgoff_t pgoff, >> +size_t nr_pages) >> +{ >> +unsigned

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-23 Thread Jane Chu
On 9/15/2021 9:15 AM, Darrick J. Wong wrote: On Wed, Sep 15, 2021 at 12:22:05AM -0700, Jane Chu wrote: Hi, Dan, On 9/14/2021 9:44 PM, Dan Williams wrote: On Tue, Sep 14, 2021 at 4:32 PM Jane Chu wrote: If pwrite(2) encounters poison in a pmem range, it fails with EIO. This is unecessary

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-23 Thread Jane Chu
On 9/15/2021 1:27 PM, Dan Williams wrote: I'm also thinking about the MOVEDIR64B instruction and how it might be used to clear poison on the fly with a single 'store'. Of course, that means we need to figure out how to narrow down the error blast radius first. It turns out the MOVDIR64B error

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-15 Thread Jane Chu
Hi, Dan, On 9/14/2021 9:44 PM, Dan Williams wrote: On Tue, Sep 14, 2021 at 4:32 PM Jane Chu wrote: If pwrite(2) encounters poison in a pmem range, it fails with EIO. This is unecessary if hardware is capable of clearing the poison. Though not all dax backend hardware has the capability

[PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation

2021-09-14 Thread Jane Chu
Provide pmem_dax_clear_poison() to struct dax_operations.clear_poison. Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 1e0615b8565e..307a53aa3432 100644 --- a/drivers

[PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-14 Thread Jane Chu
to, first, speed up repairing by means of it; second, maintain backend continuity instead of fragmenting it in search for clean blocks. Jane Chu (3): dax: introduce dax_operation dax_clear_poison dax: introduce dax_clear_poison to dax pwrite operation libnvdimm/pmem: Provide pmem_dax_clear_poison

[PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation

2021-09-14 Thread Jane Chu
When pwrite(2) encounters poison in a dax range, it fails with EIO. But if the backend hardware of the dax device is capable of clearing poison, try that and resume the write. Signed-off-by: Jane Chu --- fs/dax.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/dax.c b/fs/dax.c

[PATCH 1/3] dax: introduce dax_operation dax_clear_poison

2021-09-14 Thread Jane Chu
. Signed-off-by: Jane Chu --- drivers/dax/super.c | 13 + include/linux/dax.h | 6 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 44736cbd446e..935d496fa7db 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -373,6 +373,19

[PATCH 2/3] dax: introduce dax clear poison to page aligned dax pwrite operation

2021-09-14 Thread Jane Chu
Currenty, when pwrite(2) s issued to a dax range that contains poison, the pwrite(2) fails with EIO. Well, if the hardware backend of the dax device is capable of clearing poison, try that and resume the write. Signed-off-by: Jane Chu --- fs/dax.c | 9 + 1 file changed, 9 insertions

[PATCH] mm/memory-failure: unecessary amount of unmapping

2021-04-19 Thread Jane Chu
It appears that unmap_mapping_range() actually takes a 'size' as its third argument rather than a location, the current calling fashion causes unecessary amount of unmapping to occur. Fixes: 6100e34b2526e ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Signed-of

Re: [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink

2021-01-08 Thread Jane Chu
Hi, Shiyang, On 12/18/2020 1:13 AM, Ruan Shiyang wrote: So I tried the patchset with pmem error injection, the SIGBUS payload does not look right - ** SIGBUS(7): ** ** si_addr(0x(nil)), si_lsb(0xC), si_code(0x4, BUS_MCEERR_AR) ** I expect the payload looks like ** si_addr(0x7f3672e0),

Re: [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink

2020-12-16 Thread Jane Chu
Hi, Shiyang, On 12/15/2020 4:14 AM, Shiyang Ruan wrote: The call trace is like this: memory_failure() pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() gendisk->fops->corrupted_range() => - pmem_corrupted_range() -

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Jane Chu
On 12/15/2020 4:14 AM, Shiyang Ruan wrote: #ifdef CONFIG_SYSFS +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, + size_t len, void *data); int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); void

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Jane Chu
On 12/15/2020 3:58 AM, Ruan Shiyang wrote: Hi Jane On 2020/12/15 上午4:58, Jane Chu wrote: Hi, Shiyang, On 11/22/2020 4:41 PM, Shiyang Ruan wrote: This patchset is a try to resolve the problem of tracking shared page for fsdax. Change from v1:    - Intorduce ->block_lost() for block dev

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-14 Thread Jane Chu
Hi, Shiyang, On 11/22/2020 4:41 PM, Shiyang Ruan wrote: This patchset is a try to resolve the problem of tracking shared page for fsdax. Change from v1: - Intorduce ->block_lost() for block device - Support mapped device - Add 'not available' warning for realtime device in XFS -

[PATCH v2 2/3] libnvdimm/security: the 'security' attr never show 'overwrite' state

2020-08-03 Thread Jane Chu
te the actual state when multiple bits are set in the flags. Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/dimm_devs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index b7b77e8..5d72026 100644

[PATCH v2 3/3] libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr

2020-08-03 Thread Jane Chu
097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/security.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/

[PATCH v2 1/3] libnvdimm/security: fix a typo

2020-08-03 Thread Jane Chu
: Introduce a 'frozen' attribute") Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 4cef69b..8f3971c 100644 --- a/drivers/nvdimm/secur

Re: [PATCH 1/2] libnvdimm/security: 'security' attr never show 'overwrite' state

2020-08-03 Thread Jane Chu
Hi, Dave, On 8/3/2020 1:41 PM, Dave Jiang wrote: On 7/24/2020 9:09 AM, Jane Chu wrote: Since commit d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute"), when issue   # ndctl sanitize-dimm nmem0 --overwrite then immediately check the 'security' attribute,   # cat /s

[PATCH 2/2] libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr

2020-07-24 Thread Jane Chu
097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") Signed-off-by: Jane Chu --- drivers/nvdimm/security.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 8f3971c..4b8015

[PATCH 1/2] libnvdimm/security: 'security' attr never show 'overwrite' state

2020-07-24 Thread Jane Chu
order should be reversed. The commit also has a typo: in one occasion, 'nvdimm->sec.ext_state' assignment is replaced with 'nvdimm->sec.flags' assignment for the NVDIMM_MASTER type. Cc: Dan Williams Fixes: d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute&q

[PATCH 2/2] libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr

2020-07-23 Thread Jane Chu
097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") Signed-off-by: Jane Chu --- drivers/nvdimm/security.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 8f3971c..4b8015

[PATCH 1/2] libnvdimm/security: 'security' attr never show 'overwrite' state

2020-07-23 Thread Jane Chu
ment is replaced with 'nvdimm->sec.flags' assignment for the NVDIMM_MASTER type. Cc: Dan Williams Fixes: d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute") Signed-off-by: Jane Chu --- drivers/nvdimm/dimm_devs.c | 4 ++-- drivers/nvdimm/security.c | 2 +- 2 files changed,

Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-10-08 Thread Jane Chu
Hi, Naoya, What is the status of the patches? Is there anything I need to do from my end ? Regards, -jane On 8/6/2019 10:25 AM, Jane Chu wrote: Change in v4: - remove trailing white space Changes in v3: - move **tk cleanup to its own patch Changes in v2: - move 'tk' allocations

Re: kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-21 Thread jane . chu
Hi, Bruce, Dan, This patch took care the panic issue. thanks, -jane On 8/21/19 7:12 AM, J. Bruce Fields wrote: Probably just needs the following. I've been slow to get some bugfixes upstream, sorry--I'll go send a pull request now --b. commit 78e70e780b28 Author: He Zhe Date: Tue

Re: kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-21 Thread jane . chu
Hi, Dan, On 8/20/19 8:48 PM, Dan Williams wrote: On Tue, Aug 20, 2019 at 6:39 PM wrote: Hi, Apology if there is a better channel reporting the issue, if so, please let me know. I just saw below regression in 5.3-rc5 kernel, but not in 5.2-rc7 or earlier kernels. Is the error stable

kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-20 Thread jane . chu
Hi, Apology if there is a better channel reporting the issue, if so, please let me know. I just saw below regression in 5.3-rc5 kernel, but not in 5.2-rc7 or earlier kernels. [ 3533.659787] mce: Uncorrected hardware memory error in user-access at 383e202000 [ 3533.659903] Memory failure:

[PATCH v4 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-08-06 Thread Jane Chu
mory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Horiguc

Re: [PATCH v3 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-08-06 Thread Jane Chu
Hi, Naoya, Thanks a lot! v4 on the way. :) -jane On 8/1/2019 2:06 AM, Naoya Horiguchi wrote: On Thu, Jul 25, 2019 at 04:01:40PM -0600, Jane Chu wrote: add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit difficult

[PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-08-06 Thread Jane Chu
if "tk->addr == -EFAULT", since the code returns early. Incorporated Noaya's suggestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - added Suggested-b

[PATCH v4 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-08-06 Thread Jane Chu
add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit difficult to read. Move all the allocation internal to add_to_kill() and drop the **tk argument. Signed-off-by: Jane Chu --- mm/memory-failure.c | 40

[PATCH v3 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-07-25 Thread Jane Chu
add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit difficult to read. Move all the allocation internal to add_to_kill() and drop the **tk argument. Signed-off-by: Jane Chu --- mm/memory-failure.c | 40

[PATCH v3 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-25 Thread Jane Chu
ince the code returns early. Incorporated Noaya's suggestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - added Suggested-by: Naoya Horiguchi Jane Chu (2): mm/

[PATCH v3 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-25 Thread Jane Chu
mory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Horiguc

Re: [PATCH v2 0/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-24 Thread Jane Chu
On 7/24/2019 3:52 PM, Dan Williams wrote: On Wed, Jul 24, 2019 at 3:35 PM Jane Chu wrote: Changes in v2: - move 'tk' allocations internal to add_to_kill(), suggested by Dan; Oh, sorry if it wasn't clear, this should move to its own patch that only does the cleanup, and then the follow

Re: [PATCH v2 1/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread Jane Chu
On 7/24/2019 4:43 PM, Naoya Horiguchi wrote: On Wed, Jul 24, 2019 at 04:33:23PM -0600, Jane Chu wrote: Mmap /dev/dax more than once, then read the poison location using address from one of the mappings. The other mappings due to not having the page mapped in will cause SIGKILLs delivered

[PATCH v2 1/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread Jane Chu
mory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Horiguc

[PATCH v2 0/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-24 Thread Jane Chu
gestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - added Suggested-by: Naoya Horiguchi Jane Chu (1): mm/memory-failure: Poison read receives SIGKILL instead of S

Re: [PATCH] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread jane . chu
Thank you all for your comments! I've incorporated them, tested, and have a v2 ready for review. Thanks! -jane On 7/23/19 11:48 PM, Naoya Horiguchi wrote: Hi Jane, Dan, On Tue, Jul 23, 2019 at 06:34:35PM -0700, Dan Williams wrote: On Tue, Jul 23, 2019 at 4:49 PM Jane Chu wrote: Mmap /dev

[PATCH] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-23 Thread Jane Chu
mory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu --- mm/memory-failure.c | 16 +

Re: [PATCH 0/6] libnvdimm: Fix async operations and locking

2019-06-18 Thread Jane Chu
+++ drivers/nvdimm/pfn_devs.c | 24 +++--- drivers/nvdimm/pmem.c |4 + drivers/nvdimm/region.c | 24 +++--- drivers/nvdimm/region_devs.c| 12 ++- include/linux/device.h |6 ++ 14 files changed, 308 insertions(+), 135 deletions(-)

Re: [PATCH] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
On 5/16/2019 9:48 PM, Anshuman Khandual wrote: On 05/17/2019 09:38 AM, Jane Chu wrote: Some user who install SIGBUS handler that does longjmp out What the longjmp about ? Are you referring to the mechanism of catching the signal which was registered ? Yes. thanks, -jane

[PATCH v2] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
y. Signed-off-by: Jane Chu --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fc8b517..c4f4bcd 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -216,7 +216,7 @@ static int kill_proc(struct t

Re: [PATCH] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
Thanks Vishal and Naoya! -jane On 5/20/2019 3:21 AM, Naoya Horiguchi wrote: On Fri, May 17, 2019 at 10:18:02AM +0530, Anshuman Khandual wrote: On 05/17/2019 09:38 AM, Jane Chu wrote: Some user who install SIGBUS handler that does longjmp out What the longjmp about ? Are you referring

[PATCH] mm, memory-failure: clarify error message

2019-05-16 Thread Jane Chu
y. Signed-off-by: Jane Chu --- mm/memory-failure.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fc8b517..14de5e2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -216,10 +216,9 @@ static int kill_proc(struct t

Re: [PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race

2019-05-16 Thread Jane Chu
On 5/16/2019 2:51 PM, Dan Williams wrote: On Thu, May 16, 2019 at 9:45 AM Jane Chu wrote: Hi, I'm able to reproduce the panic below by running two sets of ndctl commands that actually serve legitimate purpose in parallel (unlike the brute force experiment earlier), each set in a indefinite

Re: [PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race

2019-05-16 Thread jane . chu
Apology for resending in plain text. -jane On 5/16/19 9:45 AM, Jane Chu wrote: Hi, I'm able to reproduce the panic below by running two sets of ndctl commands that actually serve legitimate purpose in parallel (unlike the brute force experiment earlier), each set in a indefinite loop

  1   2   >