Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Naoya Horiguchi
On Tue, Oct 22, 2019 at 12:33:25PM +0200, Oscar Salvador wrote: > On Tue, Oct 22, 2019 at 12:24:57PM +0200, Michal Hocko wrote: > > Yes, that makes a perfect sense. What I am saying that the migration > > (aka trying to recover) is the main and only difference. The soft > > offline should poison

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Naoya Horiguchi
On Tue, Oct 22, 2019 at 11:58:52AM +0200, Oscar Salvador wrote: > On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote: > > Hmm, that might be a misunderstanding on my end. I thought that it is > > the MCE handler to say whether the failure is recoverable or not. If yes > > then we can

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Oscar Salvador
On Tue, Oct 22, 2019 at 12:24:57PM +0200, Michal Hocko wrote: > Yes, that makes a perfect sense. What I am saying that the migration > (aka trying to recover) is the main and only difference. The soft > offline should poison page tables when not able to migrate as well > IIUC. Yeah, I see your

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Michal Hocko
On Tue 22-10-19 11:58:52, Oscar Salvador wrote: > On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote: > > Hmm, that might be a misunderstanding on my end. I thought that it is > > the MCE handler to say whether the failure is recoverable or not. If yes > > then we can touch the content

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Oscar Salvador
On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote: > Hmm, that might be a misunderstanding on my end. I thought that it is > the MCE handler to say whether the failure is recoverable or not. If yes > then we can touch the content of the memory (that would imply the > migration). Other

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Michal Hocko
On Tue 22-10-19 10:35:17, Oscar Salvador wrote: > On Tue, Oct 22, 2019 at 10:26:11AM +0200, Michal Hocko wrote: > > On Tue 22-10-19 09:46:20, Oscar Salvador wrote: > > [...] > > > So, opposite to hard-offline, in soft-offline we do not fiddle with pages > > > unless we are sure the page is not

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Oscar Salvador
On Tue, Oct 22, 2019 at 10:26:11AM +0200, Michal Hocko wrote: > On Tue 22-10-19 09:46:20, Oscar Salvador wrote: > [...] > > So, opposite to hard-offline, in soft-offline we do not fiddle with pages > > unless we are sure the page is not reachable anymore by any means. > > I have to say I do not

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Michal Hocko
On Tue 22-10-19 09:46:20, Oscar Salvador wrote: [...] > So, opposite to hard-offline, in soft-offline we do not fiddle with pages > unless we are sure the page is not reachable anymore by any means. I have to say I do not follow. Is there any _real_ reason for soft-offline to behave differenttly

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Oscar Salvador
On Mon, Oct 21, 2019 at 07:45:33AM +, Naoya Horiguchi wrote: > > +extern bool take_page_off_buddy(struct page *page); > > + > > +static void page_handle_poison(struct page *page) > > hwpoison is a separate idea from page poisoning, so maybe I think > it's better to be named like

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-22 Thread Oscar Salvador
On Mon, Oct 21, 2019 at 05:41:58PM +0200, Michal Hocko wrote: > On Mon 21-10-19 14:58:49, Oscar Salvador wrote: > > Nothing prevents the page to be allocated in the meantime. > > We would just bail out and return -EBUSY to userspace. > > Since we do not do __anything__ to the page until we are

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-21 Thread Michal Hocko
On Mon 21-10-19 14:58:49, Oscar Salvador wrote: > On Fri, Oct 18, 2019 at 02:06:15PM +0200, Michal Hocko wrote: > > On Thu 17-10-19 16:21:17, Oscar Salvador wrote: > > [...] > > > +bool take_page_off_buddy(struct page *page) > > > + { > > > + struct zone *zone = page_zone(page); > > > + unsigned

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-21 Thread Oscar Salvador
On Fri, Oct 18, 2019 at 02:06:15PM +0200, Michal Hocko wrote: > On Thu 17-10-19 16:21:17, Oscar Salvador wrote: > [...] > > +bool take_page_off_buddy(struct page *page) > > + { > > + struct zone *zone = page_zone(page); > > + unsigned long pfn = page_to_pfn(page); > > + unsigned long flags;

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-21 Thread Naoya Horiguchi
On Thu, Oct 17, 2019 at 04:21:17PM +0200, Oscar Salvador wrote: > When trying to soft-offline a free page, we need to first take it off > the buddy allocator. > Once we know is out of reach, we can safely flag it as poisoned. > > take_page_off_buddy will be used to take a page meant to be

Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-18 Thread Michal Hocko
On Thu 17-10-19 16:21:17, Oscar Salvador wrote: [...] > +bool take_page_off_buddy(struct page *page) > + { > + struct zone *zone = page_zone(page); > + unsigned long pfn = page_to_pfn(page); > + unsigned long flags; > + unsigned int order; > + bool ret = false; > + > +

[RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

2019-10-17 Thread Oscar Salvador
When trying to soft-offline a free page, we need to first take it off the buddy allocator. Once we know is out of reach, we can safely flag it as poisoned. take_page_off_buddy will be used to take a page meant to be poisoned off the buddy allocator. take_page_off_buddy calls