ional change in this commit.
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2cf7acc286de..685ab9a77966 100644
--- a/mm/memory-failure.c
+++ b
Add the documentation for soft offline behaviors / costs, and what
the new enable_soft_offline sysctl is for.
Acked-by: Oscar Salvador
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 32 +
1 file changed, 32 insertions
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 228
will fail with EOPNOTSUPP.
Acked-by: Miaohe Lin
Acked-by: David Rientjes
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 685ab9a77966..d55fdeed0cfc 10064
t_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-offlining pages
s
On Thu, Jun 27, 2024 at 8:29 PM Miaohe Lin wrote:
>
> On 2024/6/26 13:08, Jiaqi Yan wrote:
> > Add regression and new tests when hugepage has correctable memory
> > errors, and how userspace wants to deal with it:
> > * if enable_soft_offline=1, mapped hugepage i
On Thu, Jun 27, 2024 at 8:27 PM David Rientjes wrote:
>
> On Wed, 26 Jun 2024, Jiaqi Yan wrote:
>
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 6f5ac334efba..1559e773537f 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 228 ++
tools/testing
Add the documentation for soft offline behaviors / costs, and what
the new enable_soft_offline sysctl is for.
Acked-by: Oscar Salvador
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 32 +
1 file changed, 32 insertions
will fail with EOPNOTSUPP.
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6f5ac334efba..1559e773537f 100644
--- a/mm/memory-failure.
ional change in this commit.
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d3c830e817e3..6f5ac334efba 100644
--- a/mm/memory-failure.c
+++ b
c/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-off
On Tue, Jun 25, 2024 at 6:54 PM Miaohe Lin wrote:
>
> On 2024/6/26 7:57, Jiaqi Yan wrote:
> > On Tue, Jun 25, 2024 at 12:05 AM Miaohe Lin wrote:
> >>
> >> On 2024/6/25 0:33, Jiaqi Yan wrote:
> >>> Add regression and new tests when hugepage has correctabl
On Tue, Jun 25, 2024 at 5:02 PM Randy Dunlap wrote:
>
> Hi--
>
> On 6/24/24 9:33 AM, Jiaqi Yan wrote:
> > Add the documentation for soft offline behaviors / costs, and what
> > the new enable_soft_offline sysctl is for.
> >
> > Acked-by: Oscar Salvad
On Tue, Jun 25, 2024 at 12:05 AM Miaohe Lin wrote:
>
> On 2024/6/25 0:33, Jiaqi Yan wrote:
> > Add regression and new tests when hugepage has correctable memory
> ...
> > diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > b/tools/testing/selftest
On Mon, Jun 24, 2024 at 11:41 PM Miaohe Lin wrote:
>
> On 2024/6/25 0:33, Jiaqi Yan wrote:
> > Logs from soft_offline_page and soft_offline_in_use_page have
> > different formats than majority of the memory failure code:
> >
> > "Memory failure: 0x${pfn}: ${l
Add the documentation for soft offline behaviors / costs, and what
the new enable_soft_offline sysctl is for.
Acked-by: Oscar Salvador
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 32 +
1 file changed, 32 insertions(+)
diff --git a
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 227 ++
tools/testing
will fail with EOPNOTSUPP.
Acked-by: Miaohe Lin
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2a097af7da0e..0013d338569b 100644
--- a/mm/memory-failure.
ional change in this commit.
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d3c830e817e3..2a097af7da0e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2
pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6
On Sun, Jun 23, 2024 at 8:41 PM Miaohe Lin wrote:
>
> On 2024/6/21 2:48, Jiaqi Yan wrote:
> > Correctable memory errors are very common on servers with large
> ...
> >
> > /*
> > @@ -2749,8 +2760,9 @@ static int soft_offline_in_use_page(struct page *page)
Thanks for your comment, Andi.
On Thu, Jun 20, 2024 at 3:53 PM Andi Kleen wrote:
>
> Jiaqi Yan writes:
>
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC, but with two
> > pain points to users:
> &
On Thu, Jun 20, 2024 at 10:08 PM Muhammad Usama Anjum
wrote:
>
> On 6/20/24 11:48 PM, Jiaqi Yan wrote:
> > Add regression and new tests when hugepage has correctable memory
> > errors, and how userspace wants to deal with it:
> > * if enable_soft_offline=1, mapped h
ional change in this commit.
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d3c830e817e3..2a097af7da0e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 229 ++
tools/testing
Add the documentation for soft offline behaviors / costs, and what
the new enable_soft_offline sysctl is for.
Acked-by: Oscar Salvador
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 32 +
1 file changed, 32 insertions(+)
diff --git a
will fail with EOPNOTSUPP.
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2a097af7da0e..623aa93aff5a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory
ne.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (4):
mm/memory-failure: refactor log format in soft offline code
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm
On Tue, Jun 18, 2024 at 10:20 PM Oscar Salvador wrote:
>
> On Mon, Jun 17, 2024 at 05:05:45PM +, Jiaqi Yan wrote:
> > Add the documentation for soft offline behaviors / costs, and what
> > the new enable_soft_offline sysctl is for.
> >
> > Signed-off-by: Jiaqi Y
On Mon, Jun 17, 2024 at 8:01 PM Miaohe Lin wrote:
>
> On 2024/6/18 7:17, Jiaqi Yan wrote:
> > On Mon, Jun 17, 2024 at 12:13 PM Andrew Morton
> > wrote:
> >>
> >> On Mon, 17 Jun 2024 17:05:43 + Jiaqi Yan wrote:
> >>
> >>> Correc
On Tue, Jun 18, 2024 at 10:13 PM Oscar Salvador wrote:
>
> On Wed, Jun 19, 2024 at 07:03:46AM +0200, Oscar Salvador wrote:
> > On Mon, Jun 17, 2024 at 05:05:43PM +0000, Jiaqi Yan wrote:
> > > + if (!sysctl_enable_soft_offline) {
> > > + pr_info(&
On Tue, Jun 18, 2024 at 10:03 PM Oscar Salvador wrote:
>
> On Mon, Jun 17, 2024 at 05:05:43PM +, Jiaqi Yan wrote:
> > - * Returns 0 on success
> > - * -EOPNOTSUPP for hwpoison_filter() filtered the error event
> > + * Returns 0 on success,
> &g
On Mon, Jun 17, 2024 at 12:13 PM Andrew Morton
wrote:
>
> On Mon, 17 Jun 2024 17:05:43 + Jiaqi Yan wrote:
>
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC. Soft offline is kernel's
> > additio
Add the documentation for soft offline behaviors / costs, and what
the new enable_soft_offline sysctl is for.
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 33 +
1 file changed, 33 insertions(+)
diff --git a/Documentation/admin-guide/sysctl
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 229 ++
tools/testing
will fail with EOPNOTSUPP.
Signed-off-by: Jiaqi Yan
---
mm/memory-failure.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d3c830e817e3..9eb216ed0b86 100644
--- a/mm/memory-failure.c
+++ b/mm/memory
ol API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm
On Mon, Jun 17, 2024 at 3:34 AM Lance Yang wrote:
>
> On Mon, Jun 17, 2024 at 4:16 PM Miaohe Lin wrote:
> >
> > On 2024/6/17 15:51, Oscar Salvador wrote:
> > > On 2024-06-17 09:31, Miaohe Lin wrote:
> > >
> > >> IMHO, it might not be suitable to use EAGAIN. Because it means
> > >> "Resource tempo
Thanks for your questions, David!
On Tue, Jun 11, 2024 at 5:25 PM David Rientjes wrote:
>
> On Tue, 11 Jun 2024, Jiaqi Yan wrote:
>
> > @@ -267,6 +268,20 @@ used::
> > These are informational only. They do not mean that anything is wrong
> > with your system. To
On Thu, Jun 13, 2024 at 8:50 PM Miaohe Lin wrote:
>
> On 2024/6/12 5:55, Jiaqi Yan wrote:
> > Add regression and new tests when hugepage has correctable memory
> > errors, and how userspace wants to deal with it:
> > * if enable_soft_offline=0, mapped hugepage i
On Thu, Jun 13, 2024 at 8:28 PM Miaohe Lin wrote:
>
> On 2024/6/12 5:55, Jiaqi Yan wrote:
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC. Soft offline is kernel's
> > additional recovery han
On Fri, Jun 14, 2024 at 1:35 AM Lance Yang wrote:
>
> Hi Jiaqi,
>
> On Wed, Jun 12, 2024 at 5:56 AM Jiaqi Yan wrote:
> >
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC. Soft offline is kernel'
.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 258 ++
tools/testing
Add the documentation for what enable_soft_offline sysctl is for.
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/sysctl/vm.rst | 15 +++
1 file changed, 15 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst
b/Documentation/admin-guide/sysctl/vm.rst
index
s raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Signed-off-by: Jiaqi Yan
---
mm/memory-fail
s, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc
On Tue, Jun 11, 2024 at 10:55 AM Jane Chu wrote:
>
> On 6/10/2024 3:55 PM, Jiaqi Yan wrote:
>
> > Thanks for your feedback, Jane!
> >
> > On Mon, Jun 10, 2024 at 12:41 PM Jane Chu wrote:
> >> On 6/7/2024 3:22 PM, Jiaqi Yan wrote:
> >>
> >&
Thanks for your feedback, Jane!
On Mon, Jun 10, 2024 at 12:41 PM Jane Chu wrote:
>
> On 6/7/2024 3:22 PM, Jiaqi Yan wrote:
>
> > On Tue, Jun 4, 2024 at 12:19 AM Miaohe Lin wrote:
> >> On 2024/6/1 5:34, Jiaqi Yan wrote:
> >>> Correctable memory errors a
+CC Jane.
On Fri, May 31, 2024 at 2:34 PM Jiaqi Yan wrote:
>
> Correctable memory errors are very common on servers with large
> amount of memory, and are corrected by ECC. Soft offline is kernel's
> additional recovery handling for memory pages having (excessive)
> cor
On Tue, Jun 4, 2024 at 12:19 AM Miaohe Lin wrote:
>
> On 2024/6/1 5:34, Jiaqi Yan wrote:
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC, but with two
> > pain points to users:
> > 1. Correction usual
Add the documentation for what softoffline_corrected_errors
sysfs interface is for.
Signed-off-by: Jiaqi Yan
---
Documentation/admin-guide/mm/hugetlbpage.rst | 15 ++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst
b
the tests.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
Signed-off-by: Jiaqi Yan
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 262
he control is per hugepage size, and is kept
in corresponding hstate. By default softoffline_corrected_errors is
1 to preserve existing behavior in kernel.
Signed-off-by: Jiaqi Yan
---
include/linux/hugetlb.h | 17 +
mm/hugetlb.c| 34 ++
mseal read-only elf memory segment").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining hugetlb pages
selftest/mm: test softoffline_corrected_errors behaviors
docs: hugetlbpage.rst: add softoffline_corrected_errors
Documentation/admin-guide/mm/hugetlbpage.rst |
Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > dropping some addresses though.
> >
> > Do you know how much could an admin rely on such addresses? How frequent
> > would MCE generate normally in a sane system?
>
> I'm not sure about h
atus));
> + KSFT_PRINT_MSG(status, PREFIX "HugeTLB read HWPOISON
> test...%s\n",
> + status_to_str(status));
> close(fd);
> - if (status == TEST_FAILED)
> - return -1;
>
> fd = create_hugetlbfs_file(&file_stat);
> - if (fd < 0)
> - goto create_failure;
> - printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
> + ksft_print_msg(PREFIX "HugeTLB seek then read HWPOISON
> test...\n");
> status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
> wr_chunk_sizes[i], true);
> - printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
> - status_to_str(status));
> + KSFT_PRINT_MSG(status, PREFIX "HugeTLB seek then read
> HWPOISON test...%s\n",
> + status_to_str(status));
> close(fd);
> - if (status == TEST_FAILED)
> - return -1;
> }
>
> - return 0;
> -
> -create_failure:
> - printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
> - return -1;
> + ksft_finished();
> }
> --
> 2.42.0
>
>
This version looks good to me. Maybe someone else need to take another
look, just add mine:
Reviewed-by: Jiaqi Yan
On Sun, Jan 14, 2024 at 10:32 PM Muhammad Usama Anjum
wrote:
>
> On 1/13/24 6:08 AM, Jiaqi Yan wrote:
> > On Thu, Jan 11, 2024 at 11:21 PM Muhammad Usama Anjum
> > wrote:
> >>
> >> Conform the layout, informational and status messages to TAP. No
> >>
On Thu, Jan 11, 2024 at 11:21 PM Muhammad Usama Anjum
wrote:
>
> Conform the layout, informational and status messages to TAP. No
> functional change is intended other than the layout of output messages.
>
> Signed-off-by: Muhammad Usama Anjum
> ---
> Tested this by reverting the patch a08c7193e4
59 matches
Mail list logo