On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote:
> virtio-mem wants to allow to offline memory blocks of which some parts
> were unplugged (allocated via alloc_contig_range()), especially, to later
> offline and remove completely unplugged memory blocks. The important part
> is that PageOffline() has to remain set until the section is offline, so
> these pages will never get accessed (e.g., when dumping). The pages should
> not be handed back to the buddy (which would require clearing PageOffline()
> and result in issues if offlining fails and the pages are suddenly in the
> buddy).
> 
> Let's allow to do that by allowing to isolate any PageOffline() page
> when offlining. This way, we can reach the memory hotplug notifier
> MEM_GOING_OFFLINE, where the driver can signal that he is fine with
> offlining this page by dropping its reference count. PageOffline() pages
> with a reference count of 0 can then be skipped when offlining the
> pages (like if they were free, however they are not in the buddy).
> 
> Anybody who uses PageOffline() pages and does not agree to offline them
> (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
> decrement the reference count and make offlining fail when trying to
> migrate such an unmovable page. So there should be no observable change.
> Same applies to balloon compaction users (movable PageOffline() pages), the
> pages will simply be migrated.
> 
> Note 1: If offlining fails, a driver has to increment the reference
>       count again in MEM_CANCEL_OFFLINE.
> 
> Note 2: A driver that makes use of this has to be aware that re-onlining
>       the memory block has to be handled by hooking into onlining code
>       (online_page_callback_t), resetting the page PageOffline() and
>       not giving them to the buddy.
> 
> Reviewed-by: Alexander Duyck <alexander.h.du...@linux.intel.com>
> Acked-by: Michal Hocko <mho...@suse.com>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> Cc: Pavel Tatashin <pavel.tatas...@microsoft.com>
> Cc: Alexander Duyck <alexander.h.du...@linux.intel.com>
> Cc: Vlastimil Babka <vba...@suse.cz>
> Cc: Johannes Weiner <han...@cmpxchg.org>
> Cc: Anthony Yznaga <anthony.yzn...@oracle.com>
> Cc: Michal Hocko <mho...@suse.com>
> Cc: Oscar Salvador <osalva...@suse.de>
> Cc: Mel Gorman <mgor...@techsingularity.net>
> Cc: Mike Rapoport <r...@linux.ibm.com>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Anshuman Khandual <anshuman.khand...@arm.com>
> Cc: Qian Cai <c...@lca.pw>
> Cc: Pingfan Liu <kernelf...@gmail.com>
> Signed-off-by: David Hildenbrand <da...@redhat.com>

Andrew, could you please ack merging this through the vhost tree
together with the rest of the patches?



> ---
>  include/linux/page-flags.h | 10 +++++++++
>  mm/memory_hotplug.c        | 44 +++++++++++++++++++++++++++++---------
>  mm/page_alloc.c            | 24 +++++++++++++++++++++
>  mm/page_isolation.c        |  9 ++++++++
>  4 files changed, 77 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 49c2697046b9..fd6d4670ccc3 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
>   * not onlined when onlining the section).
>   * The content of these pages is effectively stale. Such pages should not
>   * be touched (read/write/dump/save) except by their owner.
> + *
> + * If a driver wants to allow to offline unmovable PageOffline() pages 
> without
> + * putting them back to the buddy, it can do so via the memory notifier by
> + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
> + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
> + * pages (now with a reference count of zero) are treated like free pages,
> + * allowing the containing memory block to get offlined. A driver that
> + * relies on this feature is aware that re-onlining the memory block will
> + * require to re-set the pages PageOffline() and not giving them to the
> + * buddy via online_page_callback_t.
>   */
>  PAGE_TYPE_OPS(Offline, offline)
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1a00b5a37ef6..ab1c31e67fd1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long 
> start_pfn,
>  
>  /*
>   * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
> - * non-lru movable pages and hugepages). We scan pfn because it's much
> - * easier than scanning over linked list. This function returns the pfn
> - * of the first found movable page if it's found, otherwise 0.
> + * non-lru movable pages and hugepages). Will skip over most unmovable
> + * pages (esp., pages that can be skipped when offlining), but bail out on
> + * definitely unmovable pages.
> + *
> + * Returns:
> + *   0 in case a movable page is found and movable_pfn was updated.
> + *   -ENOENT in case no movable page was found.
> + *   -EBUSY in case a definitely unmovable page was found.
>   */
> -static unsigned long scan_movable_pages(unsigned long start, unsigned long 
> end)
> +static int scan_movable_pages(unsigned long start, unsigned long end,
> +                           unsigned long *movable_pfn)
>  {
>       unsigned long pfn;
>  
> @@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned long 
> start, unsigned long end)
>                       continue;
>               page = pfn_to_page(pfn);
>               if (PageLRU(page))
> -                     return pfn;
> +                     goto found;
>               if (__PageMovable(page))
> -                     return pfn;
> +                     goto found;
> +
> +             /*
> +              * PageOffline() pages that are not marked __PageMovable() and
> +              * have a reference count > 0 (after MEM_GOING_OFFLINE) are
> +              * definitely unmovable. If their reference count would be 0,
> +              * they could at least be skipped when offlining memory.
> +              */
> +             if (PageOffline(page) && page_count(page))
> +                     return -EBUSY;
>  
>               if (!PageHuge(page))
>                       continue;
>               head = compound_head(page);
>               if (page_huge_active(head))
> -                     return pfn;
> +                     goto found;
>               skip = compound_nr(head) - (page - head);
>               pfn += skip - 1;
>       }
> +     return -ENOENT;
> +found:
> +     *movable_pfn = pfn;
>       return 0;
>  }
>  
> @@ -1515,7 +1533,8 @@ static int __ref __offline_pages(unsigned long 
> start_pfn,
>       }
>  
>       do {
> -             for (pfn = start_pfn; pfn;) {
> +             pfn = start_pfn;
> +             do {
>                       if (signal_pending(current)) {
>                               ret = -EINTR;
>                               reason = "signal backoff";
> @@ -1525,14 +1544,19 @@ static int __ref __offline_pages(unsigned long 
> start_pfn,
>                       cond_resched();
>                       lru_add_drain_all();
>  
> -                     pfn = scan_movable_pages(pfn, end_pfn);
> -                     if (pfn) {
> +                     ret = scan_movable_pages(pfn, end_pfn, &pfn);
> +                     if (!ret) {
>                               /*
>                                * TODO: fatal migration failures should bail
>                                * out
>                                */
>                               do_migrate_range(pfn, end_pfn);
>                       }
> +             } while (!ret);
> +
> +             if (ret != -ENOENT) {
> +                     reason = "unmovable page";
> +                     goto failed_removal_isolated;
>               }
>  
>               /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8d7be3f33e26..baa60222215f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8366,6 +8366,19 @@ struct page *has_unmovable_pages(struct zone *zone, 
> struct page *page,
>               if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
>                       continue;
>  
> +             /*
> +              * We treat all PageOffline() pages as movable when offlining
> +              * to give drivers a chance to decrement their reference count
> +              * in MEM_GOING_OFFLINE in order to indicate that these pages
> +              * can be offlined as there are no direct references anymore.
> +              * For actually unmovable PageOffline() where the driver does
> +              * not support this, we will fail later when trying to actually
> +              * move these pages that still have a reference count > 0.
> +              * (false negatives in this function only)
> +              */
> +             if ((flags & MEMORY_OFFLINE) && PageOffline(page))
> +                     continue;
> +
>               if (__PageMovable(page) || PageLRU(page))
>                       continue;
>  
> @@ -8786,6 +8799,17 @@ __offline_isolated_pages(unsigned long start_pfn, 
> unsigned long end_pfn)
>                       offlined_pages++;
>                       continue;
>               }
> +             /*
> +              * At this point all remaining PageOffline() pages have a
> +              * reference count of 0 and can simply be skipped.
> +              */
> +             if (PageOffline(page)) {
> +                     BUG_ON(page_count(page));
> +                     BUG_ON(PageBuddy(page));
> +                     pfn++;
> +                     offlined_pages++;
> +                     continue;
> +             }
>  
>               BUG_ON(page_count(page));
>               BUG_ON(!PageBuddy(page));
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 2c11a38d6e87..f6d07c5f0d34 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long 
> nr_pages)
>   *                   a bit mask)
>   *                   MEMORY_OFFLINE - isolate to offline (!allocate) memory
>   *                                    e.g., skip over PageHWPoison() pages
> + *                                    and PageOffline() pages.
>   *                   REPORT_FAILURE - report details about the failure to
>   *                   isolate the range
>   *
> @@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
> unsigned long end_pfn,
>               else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
>                       /* A HWPoisoned page cannot be also PageBuddy */
>                       pfn++;
> +             else if ((flags & MEMORY_OFFLINE) && PageOffline(page) &&
> +                      !page_count(page))
> +                     /*
> +                      * The responsible driver agreed to skip PageOffline()
> +                      * pages when offlining memory by dropping its
> +                      * reference in MEM_GOING_OFFLINE.
> +                      */
> +                     pfn++;
>               else
>                       break;
>       }
> -- 
> 2.24.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to