On Wed, Aug 24, 2022 at 02:52:51PM -0700, Dan Williams wrote:
> Shiyang Ruan wrote:
> > This new function is a variant of mf_generic_kill_procs that accepts a
> > file, offset pair instead of a struct to support multiple files sharing
> > a DAX mapping. It is intended to be called by the file
On Tue, Apr 19, 2022 at 12:50:43PM +0800, Shiyang Ruan wrote:
> This new function is a variant of mf_generic_kill_procs that accepts a
> file, offset pair instead of a struct to support multiple files sharing
> a DAX mapping. It is intended to be called by the file systems as part
> of the
On Tue, Apr 19, 2022 at 12:50:41PM +0800, Shiyang Ruan wrote:
> When memory-failure occurs, we call this function which is implemented
> by each kind of devices. For the fsdax case, pmem device driver
> implements it. Pmem device driver will find out the filesystem in which
> the corrupted page
On Tue, Apr 19, 2022 at 12:50:40PM +0800, Shiyang Ruan wrote:
> memory_failure_dev_pagemap code is a bit complex before introduce RMAP
> feature for fsdax. So it is needed to factor some helper functions to
> simplify these code.
>
> Signed-off-by: Shiyang Ruan
> Reviewed-by: Darrick J. Wong
>
Hi everyone,
On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote:
> On Wed, Apr 20, 2022 at 07:20:07PM -0700, Dan Williams wrote:
> > [ add Andrew and Naoya ]
> >
> > On Wed, Apr 20, 2022 at 6:48 PM Shiyang Ruan
> > wrote:
> > >
> > > Hi Dave,
> > >
> > > 在 2022/4/21 9:20, Dave
On Tue, Apr 20, 2021 at 08:42:53AM -0700, Luck, Tony wrote:
> On Mon, Apr 19, 2021 at 06:49:15PM -0700, Jue Wang wrote:
> > On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
> > ...
> > > + * This function is intended to handle "Action Required" MCEs on already
> > > + * hardware poisoned
On Tue, Apr 20, 2021 at 12:16:57PM +0200, Borislav Petkov wrote:
> On Tue, Apr 20, 2021 at 07:46:26AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > If you have any other suggestion, please let me know.
>
> Looks almost ok...
>
> > From: Tony Luck
> > Date: Tue, 20 Apr 2
On Mon, Apr 19, 2021 at 06:49:15PM -0700, Jue Wang wrote:
> On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
> ...
> > + * This function is intended to handle "Action Required" MCEs on already
> > + * hardware poisoned pages. They could happen, for example, when
> > + * memory_failure()
On Mon, Apr 19, 2021 at 07:05:38PM +0200, Borislav Petkov wrote:
> On Tue, Apr 13, 2021 at 07:43:18AM +0900, Naoya Horiguchi wrote:
> > From: Tony Luck
> >
> > There can be races when multiple CPUs consume poison from the same
> > page. The first into memory_failure() atomically sets the
On Mon, Apr 19, 2021 at 06:28:21PM -0600, Jane Chu wrote:
> It appears that unmap_mapping_range() actually takes a 'size' as its
> third argument rather than a location, the current calling fashion
> causes unecessary amount of unmapping to occur.
>
> Fixes: 6100e34b2526e ("mm, memory_failure:
On Sat, Apr 17, 2021 at 01:47:51PM +0800, Aili Yao wrote:
> On Tue, 13 Apr 2021 07:43:17 +0900
> Naoya Horiguchi wrote:
>
> > Hi,
> >
> > I wrote this patchset to materialize what I think is the current
> > allowable solution mentioned by the previous discussion [1].
> > I simply borrowed
On Tue, Apr 06, 2021 at 10:41:23AM +0800, Aili Yao wrote:
> When we call get_user_pages() to pin user page in memory, there may be
> hwpoison page, currently, we just handle the normal case that memory
> recovery jod is correctly finished, and we will not return the hwpoison
> page to callers, but
On Fri, Apr 02, 2021 at 03:11:20PM +, Luck, Tony wrote:
> >> Combined with my "mutex" patch (to get rid of races where 2nd process
> >> returns
> >> early, but first process is still looking for mappings to unmap and tasks
> >> to signal) this patch moves forward a bit. But I think it needs
On Wed, Mar 31, 2021 at 07:07:39AM +0100, Matthew Wilcox wrote:
> On Wed, Mar 31, 2021 at 01:52:59AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > If we successfully unmapped but failed in truncate_error_page() for example,
> > the processes mapping the page would get -EFAULT as expe
On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote:
> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也)
> wrote:
> > On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote:
> > > On 26.03.21 15:09, David Hildenbrand wrote:
> > > >
On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote:
> On 26.03.21 15:09, David Hildenbrand wrote:
> > On 22.03.21 12:33, Aili Yao wrote:
> > > When we do coredump for user process signal, this may be one SIGBUS signal
> > > with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this
On Fri, Mar 12, 2021 at 11:48:31PM +, Luck, Tony wrote:
> >> will memory_failure() find it and unmap it? if succeed, then the current
> >> will be
> >> signaled with correct vaddr and shift?
> >
> > That's a very good question. I didn't see a SIGBUS when I first wrote this
> > code,
> >
On Wed, Mar 10, 2021 at 02:10:42PM +0800, Aili Yao wrote:
> On Fri, 5 Mar 2021 15:55:25 +
> "Luck, Tony" wrote:
>
> > > From the walk, it seems we have got the virtual address, can we just send
> > > a SIGBUS with it?
> >
> > If the walk wins the race and the pte for the poisoned page is
On Tue, Mar 09, 2021 at 12:01:40PM -0800, Luck, Tony wrote:
> On Tue, Mar 09, 2021 at 08:28:24AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote:
> > > When the page is already poisoned, another memory_failure() call in the
&
On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote:
> When the page is already poisoned, another memory_failure() call in the
> same page now return 0, meaning OK. For nested memory mce handling, this
> behavior may lead to mce looping, Example:
>
> 1.When LCME is enabled, and there are two
On Tue, Mar 09, 2021 at 10:04:21AM +0800, Aili Yao wrote:
> On Mon, 8 Mar 2021 14:55:04 -0800
> "Luck, Tony" wrote:
>
> > There can be races when multiple CPUs consume poison from the same
> > page. The first into memory_failure() atomically sets the HWPoison
> > page flag and begins hunting for
On Mon, Mar 08, 2021 at 02:55:04PM -0800, Luck, Tony wrote:
> There can be races when multiple CPUs consume poison from the same
> page. The first into memory_failure() atomically sets the HWPoison
> page flag and begins hunting for tasks that map this page. Eventually
> it invalidates those
On Mon, Mar 08, 2021 at 06:54:02PM +, Luck, Tony wrote:
> >> So it should be safe to grab and hold a mutex. See patch below.
> >
> > The mutex approach looks simpler and safer, so I'm fine with it.
>
> Thanks. Is that an "Acked-by:"?
Not yet, I intended to add it after full patch is
On Fri, Mar 05, 2021 at 02:11:43PM -0800, Luck, Tony wrote:
> This whole page table walking patch is trying to work around the
> races caused by multiple calls to memory_failure() for the same
> page.
>
> Maybe better to just avoid the races. The comment right above
> memory_failure says:
>
>
On Thu, Feb 25, 2021 at 10:15:42AM -0800, Luck, Tony wrote:
> On Thu, Feb 25, 2021 at 12:38:06PM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Thank you for shedding light on this, this race looks worrisome to me.
> > We call try_to_unmap() inside memory_failure(), where we find af
On Thu, Feb 25, 2021 at 12:39:30PM +0100, Oscar Salvador wrote:
> On Thu, Feb 25, 2021 at 11:28:18AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hi Aili,
> >
> > I agree that this set_mce_nospec() is not expected to be called for
> > "already hwpoisoned" page b
On Thu, Feb 25, 2021 at 11:43:29AM +0800, Aili Yao wrote:
> On Wed, 24 Feb 2021 11:31:55 +0100 Oscar Salvador wrote:
...
>
> > >
> > > 3.The kill_me_maybe will check the return:
> > >
> > > 1244 static void kill_me_maybe(struct callback_head *cb)
> > > 1245 {
> > >
> > > 1254 if
Hi Aili,
On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote:
> When one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> mode it may
On Wed, Jan 13, 2021 at 10:18:09PM -0800, Dan Williams wrote:
> On Wed, Jan 13, 2021 at 5:50 PM HORIGUCHI NAOYA(堀口 直也)
> wrote:
> >
> > On Wed, Jan 13, 2021 at 04:43:32PM -0800, Dan Williams wrote:
> > > The conversion to move pfn_to_online_page() internal to
>
On Wed, Jan 13, 2021 at 04:43:37PM -0800, Dan Williams wrote:
> Given 'struct dev_pagemap' spans both data pages and metadata pages be
> careful to consult the altmap if present to delineate metadata. In fact
> the pfn_first() helper already identifies the first valid data pfn, so
> export that
On Wed, Jan 13, 2021 at 04:43:32PM -0800, Dan Williams wrote:
> The conversion to move pfn_to_online_page() internal to
> soft_offline_page() missed that the get_user_pages() reference taken by
> the madvise() path needs to be dropped when pfn_to_online_page() fails.
> Note the direct sysfs-path
On Fri, Jan 08, 2021 at 09:52:02AM +0100, Oscar Salvador wrote:
> Format %pG expects a lower case 'p' in order to print the flags.
> Fix it.
>
> Reported-by: Dan Carpenter
> Signed-off-by: Oscar Salvador
> Fixes: 8295d535e2aa ("mm,hwpoison: refactor get_any_page")
Thank you!
Acked-by: Naoya
On Tue, Jan 05, 2021 at 03:10:35PM +0800, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也)
> wrote:
> >
> > On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we ca
On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.
>
> Signed-off-by: Muchun Song
> ---
> mm/hugetlb.c | 16 +++-
> 1 file changed, 11 insertions(+), 5
On Wed, Dec 09, 2020 at 10:28:18AM +0100, Oscar Salvador wrote:
> Currently, we return -EIO when we fail to migrate the page.
>
> Migrations' failures are rather transient as they can happen due to
> several reasons, e.g: high page refcount bump, mapping->migrate_page
> failing etc.
> All meaning
On Mon, Dec 07, 2020 at 10:48:18AM +0100, Oscar Salvador wrote:
> madvise_inject_error() uses get_user_pages_fast to translate the
> address we specified to a page.
> After [1], we drop the extra reference count for memory_failure() path.
> That commit says that memory_failure wanted to keep the
On Mon, Dec 07, 2020 at 06:22:00PM -0800, Andrew Morton wrote:
> On Mon, 7 Dec 2020 10:48:18 +0100 Oscar Salvador wrote:
>
> > madvise_inject_error() uses get_user_pages_fast to translate the
> > address we specified to a page.
> > After [1], we drop the extra reference count for
On Sat, Dec 05, 2020 at 04:34:23PM +0100, Oscar Salvador wrote:
> On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote:
> > OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but
> > it's
> > already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
On Thu, Nov 19, 2020 at 11:57:16AM +0100, Oscar Salvador wrote:
> get_hwpoison_page already drains pcplists, previously disabling
> them when trying to grab a refcount.
> We do not need shake_page to take care of it anymore.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Thu, Nov 19, 2020 at 11:57:15AM +0100, Oscar Salvador wrote:
> Currently, we have a sort of retry mechanism to make sure pages in
> pcp-lists are spilled to the buddy system, so we can handle those.
>
> We can save us this extra checks with the new disable-pcplist mechanism
> that is available
On Thu, Nov 19, 2020 at 11:57:10AM +0100, Oscar Salvador wrote:
> When we want to grab a refcount via get_any_page, we call
> __get_any_page that calls get_hwpoison_page to get the
> actual refcount.
> get_any_page is only there because we have a sort of retry
> mechanism in case the page we met
On Thu, Nov 19, 2020 at 11:57:11AM +0100, Oscar Salvador wrote:
> pfn parameter is no longer needed, drop it.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Thu, Nov 05, 2020 at 11:50:58AM -0800, Mike Kravetz wrote:
> Qian Cai reported the following BUG in [1]
>
> [ 6147.019063][T45242] LTP: starting move_pages12
> [ 6147.475680][T64921] BUG: unable to handle page fault for address:
> ffe0
> ...
> [ 6147.525866][T64921] RIP:
On Tue, Oct 13, 2020 at 04:10:59PM -0700, Mike Kravetz wrote:
> Due to pmd sharing, the huge PTE pointer returned by huge_pte_alloc
> may not be valid. This can happen if a call to huge_pmd_unshare for
> the same pmd is made in another thread.
>
> To address this issue, add a rw_semaphore
On Tue, Oct 13, 2020 at 04:44:47PM +0200, Oscar Salvador wrote:
> memory_failure and soft_offline_path paths now drain pcplists by calling
> get_hwpoison_page.
>
> memory_failure flags the page as HWPoison before, so that page cannot
> longer go into a pcplist, and soft_offline_page only flags a
On Tue, Oct 13, 2020 at 04:44:46PM +0200, Oscar Salvador wrote:
> Currently, free hugetlb get dissolved, but we also need to make sure
> to take the poisoned subpage off the buddy frelists, so no one stumbles
> upon it (see previous patch for more information).
>
> Signed-off-by: Oscar Salvador
On Thu, Sep 17, 2020 at 10:10:43AM +0200, Oscar Salvador wrote:
> The crux of the matter is that historically we left poisoned pages in the
> buddy system because we have some checks in place when allocating a page
> that a gatekeeper for poisoned pages. Unfortunately, we do have other
> users
On Thu, Sep 17, 2020 at 10:10:47AM +0200, Oscar Salvador wrote:
> A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
> Currently, we bail out with an error if we encounter such a page, meaning
> that we do not handle pcppages neither from hard-offline nor from
> soft-offline path.
On Tue, Sep 22, 2020 at 03:56:50PM +0200, Oscar Salvador wrote:
> Aristeu Rozanski reported that a customer test case started
> to report -EBUSY after the hwpoison rework patchset.
>
> There is a race window between spotting a free page and taking it off
> its buddy freelist, so it might be that
On Tue, Sep 22, 2020 at 03:56:47PM +0200, Oscar Salvador wrote:
> Currently, there is an inconsistency when calling soft-offline from
> different paths on a page that is already poisoned.
>
> 1) madvise:
>
> madvise_inject_error skips any poisoned page and continues
> the loop.
>
On Tue, Sep 22, 2020 at 03:56:46PM +0200, Oscar Salvador wrote:
> Merging soft_offline_huge_page and __soft_offline_page let us get rid of
> quite some duplicated code, and makes the code much easier to follow.
>
> Now, __soft_offline_page will handle both normal and hugetlb pages.
>
>
On Tue, Sep 22, 2020 at 03:56:45PM +0200, Oscar Salvador wrote:
> This patch changes the way we set and handle in-use poisoned pages. Until
> now, poisoned pages were released to the buddy allocator, trusting that
> the checks that take place at allocation time would act as a safe net
> and would
On Tue, Sep 22, 2020 at 03:56:44PM +0200, Oscar Salvador wrote:
> When trying to soft-offline a free page, we need to first take it off the
> buddy allocator.
> Once we know is out of reach, we can safely flag it as poisoned.
>
> take_page_off_buddy will be used to take a page meant to be
On Tue, Sep 22, 2020 at 03:56:43PM +0200, Oscar Salvador wrote:
> Place the THP's page handling in a helper and use it from both hard and
> soft-offline machinery, so we get rid of some duplicated code.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Tue, Sep 22, 2020 at 03:56:42PM +0200, Oscar Salvador wrote:
> After commit 4e41a30c6d50 ("mm: hwpoison: adjust for new thp
> refcounting"), put_hwpoison_page got reduced to a put_page. Let us just
> use put_page instead.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Tue, Sep 22, 2020 at 03:56:40PM +0200, Oscar Salvador wrote:
> Since get_hwpoison_page is only used in memory-failure code now, let us
> un-export it and make it private to that code.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Tue, Sep 22, 2020 at 03:56:41PM +0200, Oscar Salvador wrote:
> Make a proper if-else condition for {hard,soft}-offline.
>
> [akpm: refactor comment]
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Thu, Sep 17, 2020 at 03:40:06PM +0200, Oscar Salvador wrote:
> On Thu, Sep 17, 2020 at 03:09:52PM +0200, Oscar Salvador wrote:
> > static bool page_handle_poison(struct page *page, bool
> > hugepage_or_freepage, bool release)
> > {
> > if (release) {
> > put_page(page);
On Thu, Sep 17, 2020 at 10:10:42AM +0200, Oscar Salvador wrote:
> This patchset includes some fixups (patch#1,patch#2 and patch#3)
> and some cleanups (patch#4-7).
>
> Patch#1 is a fix to take off HWPoison pages off a buddy freelist since
> it can lead us to having HWPoison pages back in the game
On Tue, Sep 15, 2020 at 05:22:22PM -0400, Aristeu Rozanski wrote:
> Hi Oscar, Naoya,
>
> On Mon, Sep 14, 2020 at 12:15:54PM +0200, Oscar Salvador wrote:
> > The important bit of this patchset is patch#1, which is a fix to take off
> > HWPoison pages off a buddy freelist since it can lead us to
On Tue, Sep 08, 2020 at 09:56:22AM +0200, Oscar Salvador wrote:
> The crux of the matter is that historically we left poisoned pages
> in the buddy system because we have some checks in place when
> allocating a page that a gatekeeper for poisoned pages.
> Unfortunately, we do have other users
On Wed, Sep 02, 2020 at 11:45:08AM +0200, Oscar Salvador wrote:
> Make a proper if-else condition for {hard,soft}-offline.
>
> Signed-off-by: Oscar Salvador
Acked-by: Naoya Horiguchi
On Wed, Sep 02, 2020 at 11:45:07AM +0200, Oscar Salvador wrote:
> The crux of the matter is that historically we left poisoned pages
> in the buddy system because we have some checks in place when
> allocating a page that a gatekeeper for poisoned pages.
> Unfortunately, we do have other users
On Sun, Aug 30, 2020 at 03:44:18PM -0400, Qian Cai wrote:
> On Sun, Aug 30, 2020 at 04:10:53PM +0800, Muchun Song wrote:
> > When we isolate page fail, we should not return 0, because we do not
> > set page HWPoison on any page.
> >
> > Signed-off-by: Muchun Song
>
> This seems solve the
Hi,
On Thu, Aug 27, 2020 at 09:32:05AM -0700, Tony Luck wrote:
> For discussion ... I'm 100% sure the patch below is the wrong way to
> fix this ... for one thing it doesn't provide the virtual address of
> the error to the user signal handler. For another it just looks like
> a hack. I'm just
On Tue, Aug 18, 2020 at 04:26:47PM +0800, Xianting Tian wrote:
> There is no need to calcaulate pgoff in each loop of for_each_process(),
> so move it to the place before for_each_process(), which can save some
> CPU cycles.
>
> Signed-off-by: Xianting Tian
Looks good to me. Thank you.
On Mon, Aug 10, 2020 at 11:45:36PM -0400, Qian Cai wrote:
>
>
> > On Aug 10, 2020, at 11:11 PM, HORIGUCHI NAOYA(堀口 直也)
> > wrote:
> >
> > I'm still not sure why the test succeeded by reverting these because
> > current mainline kernel provides similar me
On Mon, Aug 10, 2020 at 11:22:55AM -0400, Qian Cai wrote:
> On Thu, Aug 06, 2020 at 06:49:11PM +, nao.horigu...@gmail.com wrote:
> > Hi,
> >
> > This patchset is the latest version of soft offline rework patchset
> > targetted for v5.9.
> >
> > Since v5, I dropped some patches which tweak
On Mon, Aug 03, 2020 at 09:49:42PM -0400, Qian Cai wrote:
> On Tue, Aug 04, 2020 at 01:16:45AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Aug 03, 2020 at 03:07:09PM -0400, Qian Cai wrote:
> > > On Fri, Jul 31, 2020 at 12:20:56PM +, nao.horigu...@gmail.com wrote:
>
On Mon, Aug 03, 2020 at 11:19:09AM -0400, Qian Cai wrote:
> On Mon, Aug 03, 2020 at 01:36:58PM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hello,
> >
> > On Mon, Aug 03, 2020 at 08:39:55AM -0400, Qian Cai wrote:
> > > On Fri, Jul 31, 2020 at 12:20:56PM +,
On Mon, Aug 03, 2020 at 03:07:09PM -0400, Qian Cai wrote:
> On Fri, Jul 31, 2020 at 12:20:56PM +, nao.horigu...@gmail.com wrote:
> > This patchset is the latest version of soft offline rework patchset
> > targetted for v5.9.
> >
> > Main focus of this series is to stabilize soft offline.
Hello,
On Mon, Aug 03, 2020 at 08:39:55AM -0400, Qian Cai wrote:
> On Fri, Jul 31, 2020 at 12:20:56PM +, nao.horigu...@gmail.com wrote:
> > This patchset is the latest version of soft offline rework patchset
> > targetted for v5.9.
> >
> > Main focus of this series is to stabilize soft
On Thu, Jul 16, 2020 at 02:38:06PM +0200, Oscar Salvador wrote:
> This patch changes the way we set and handle in-use poisoned pages.
> Until now, poisoned pages were released to the buddy allocator, trusting
> that the checks that take place prior to deliver the page to its end
> user would act
On Wed, Jul 01, 2020 at 10:22:07AM +0200, Oscar Salvador wrote:
> On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > > Even after applied the compling fix,
> > >
> > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > >
> > > madvise(MADV_SOFT_OFFLINE) will fail with
On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote:
> On Wed, 2020-06-24 at 15:01 +, nao.horigu...@gmail.com wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly that
> > was
>
On Mon, Jun 29, 2020 at 05:22:40PM +1000, Stephen Rothwell wrote:
> Hi Naoya,
>
> On Sun, 28 Jun 2020 15:54:09 +0900 Naoya Horiguchi
> wrote:
> >
> > Andrew, could you append this diff on top of this series on mmotm?
>
> I have added that patch to linux-next today.
Thank you!
- Naoya
On Wed, Jun 24, 2020 at 03:49:47PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 22:36:18 + HORIGUCHI NAOYA(堀口 直也)
> wrote:
>
> > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2020 15:01:22 + nao.horigu...@gmail.com
On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 15:01:22 + nao.horigu...@gmail.com wrote:
>
> > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly that was
> >
Hi Dmitry,
On Thu, Jun 11, 2020 at 07:43:19PM +0300, Dmitry Yakunin wrote:
> Hello!
>
> We are faced with similar problems with hwpoisoned pages
> on one of our production clusters after kernel update to stable 4.19.
> Application that does a lot of memory allocations sometimes caught SIGBUS
>
On Sat, May 30, 2020 at 09:08:43AM +0200, Pankaj Gupta wrote:
> > Some processes dont't want to be killed early, but in "Action Required"
> > case, those also may be killed by BUS_MCEERR_AO when sharing memory
> > with other which is accessing the fail memory.
> > And sending SIGBUS with
On Fri, May 29, 2020 at 01:56:25PM +0800, wetp wrote:
> On 2020/5/29 上午10:12, HORIGUCHI NAOYA(堀口 直也) wrote:
...
> > > > > @@ -225,8 +225,9 @@ static int kill_proc(struct to_kill *tk, unsigned
> > > > > long pfn, int flags)
> > > > >
On Thu, May 28, 2020 at 02:50:09PM +0800, wetp wrote:
>
> On 2020/5/28 上午10:22, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hi Zhang,
> >
> > Sorry for my late response.
> >
> > On Tue, May 26, 2020 at 03:06:41PM +0800, Wetp Zhang wrote:
> > > From: Zhang Y
Hi Zhang,
Sorry for my late response.
On Tue, May 26, 2020 at 03:06:41PM +0800, Wetp Zhang wrote:
> From: Zhang Yi
>
> If a process don't need early-kill, it may not care the BUS_MCEERR_AO.
> Let the process to be killed when it really access the corrupted memory.
>
> Signed-off-by: Zhang Yi
On Mon, May 18, 2020 at 12:12:36PM +0530, Anshuman Khandual wrote:
> This adds the following two new VM events which will help in validating PMD
> based THP migration without split. Statistics reported through these events
> will help in performance debugging.
>
> 1. THP_PMD_MIGRATION_SUCCESS
>
On Thu, May 14, 2020 at 10:46:33PM -0400, Qian Cai wrote:
>
>
> > On Oct 20, 2019, at 11:16 PM, Naoya Horiguchi
> > wrote:
> >
> > On Fri, Oct 18, 2019 at 07:56:09AM -0400, Qian Cai wrote:
> >>
> >>
> >>On Oct 18, 2019, at 2:35 AM, Naoya Horiguchi
> >>wrote:
> >>
> >>
> >>
85 matches
Mail list logo