Re: [PATCH RFC v3 30/35] arm64: mte: ptrace: Handle pages with missing tag storage

2024-02-01 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> A page can end up mapped in a MTE enabled VMA without the corresponding tag
> storage block reserved. Tag accesses made by ptrace in this case can lead
> to the wrong tags being read or memory corruption for the process that is
> using the tag storage memory as data.
> 
> Reserve tag storage by treating ptrace accesses like a fault.
> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch, issue reported by Peter Collingbourne.
> 
>  arch/arm64/kernel/mte.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index faf09da3400a..b1fa02dad4fd 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -412,10 +412,13 @@ static int __access_remote_tags(struct mm_struct *mm, 
> unsigned long addr,
>   while (len) {
>   struct vm_area_struct *vma;
>   unsigned long tags, offset;
> + unsigned int fault_flags;
> + struct page *page;
> + vm_fault_t ret;
>   void *maddr;
> - struct page *page = get_user_page_vma_remote(mm, addr,
> -  gup_flags, );
>  
> +get_page:
> + page = get_user_page_vma_remote(mm, addr, gup_flags, );

But if there is valid page returned here in the first GUP attempt, will there
still be a subsequent handle_mm_fault() on the same vma and addr ?

>   if (IS_ERR(page)) {
>   err = PTR_ERR(page);
>   break;
> @@ -433,6 +436,25 @@ static int __access_remote_tags(struct mm_struct *mm, 
> unsigned long addr,
>   put_page(page);
>   break;
>   }
> +
> + if (tag_storage_enabled() && !page_tag_storage_reserved(page)) {

Should not '!page' be checked here as well ?

> + fault_flags = FAULT_FLAG_DEFAULT | \
> +   FAULT_FLAG_USER | \
> +   FAULT_FLAG_REMOTE | \
> +   FAULT_FLAG_ALLOW_RETRY | \
> +   FAULT_FLAG_RETRY_NOWAIT;
> + if (write)
> + fault_flags |= FAULT_FLAG_WRITE;
> +
> + put_page(page);
> + ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> + if (ret & VM_FAULT_ERROR) {
> + err = -EFAULT;
> + break;
> + }
> + goto get_page;
> + }
> +
>   WARN_ON_ONCE(!page_mte_tagged(page));
>  
>   /* limit access to the end of the page */



Re: [PATCH RFC v3 31/35] khugepaged: arm64: Don't collapse MTE enabled VMAs

2024-02-01 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> copy_user_highpage() will do memory allocation if there are saved tags for
> the destination page, and the page is missing tag storage.
> 
> After commit a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
> rcu_read_unlock()s"), collapse_huge_page() calls
> __collapse_huge_page_copy() -> .. -> copy_user_highpage() with the RCU lock
> held, which means that copy_user_highpage() can only allocate memory using
> GFP_ATOMIC or equivalent.
> 
> Get around this by refusing to collapse pages into a transparent huge page
> if the VMA is MTE-enabled.

Makes sense when copy_user_highpage() will allocate memory for tag storage.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch. I think an agreement on whether copy*_user_highpage() should be
> always allowed to sleep, or should not be allowed, would be useful.

This is a good question ! Even after preventing the collapse of MTE VMA here,
there still might be more paths where a sleeping (i.e memory allocating)
copy*_user_highpage() becomes problematic ?

> 
>  arch/arm64/include/asm/pgtable.h| 3 +++
>  arch/arm64/kernel/mte_tag_storage.c | 5 +
>  include/linux/khugepaged.h  | 5 +
>  mm/khugepaged.c | 4 
>  4 files changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 87ae59436162..d0473538c926 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1120,6 +1120,9 @@ static inline bool arch_alloc_cma(gfp_t gfp_mask)
>   return true;
>  }
>  
> +bool arch_hugepage_vma_revalidate(struct vm_area_struct *vma, unsigned long 
> address);
> +#define arch_hugepage_vma_revalidate arch_hugepage_vma_revalidate
> +
>  #endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
>  #endif /* CONFIG_ARM64_MTE */
>  
> diff --git a/arch/arm64/kernel/mte_tag_storage.c 
> b/arch/arm64/kernel/mte_tag_storage.c
> index ac7b9c9c585c..a99959b70573 100644
> --- a/arch/arm64/kernel/mte_tag_storage.c
> +++ b/arch/arm64/kernel/mte_tag_storage.c
> @@ -636,3 +636,8 @@ void arch_alloc_page(struct page *page, int order, gfp_t 
> gfp)
>   if (tag_storage_enabled() && alloc_requires_tag_storage(gfp))
>   reserve_tag_storage(page, order, gfp);
>  }
> +
> +bool arch_hugepage_vma_revalidate(struct vm_area_struct *vma, unsigned long 
> address)
> +{
> + return !(vma->vm_flags & VM_MTE);
> +}
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index f68865e19b0b..461e4322dff2 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -38,6 +38,11 @@ static inline void khugepaged_exit(struct mm_struct *mm)
>   if (test_bit(MMF_VM_HUGEPAGE, >flags))
>   __khugepaged_exit(mm);
>  }
> +
> +#ifndef arch_hugepage_vma_revalidate
> +#define arch_hugepage_vma_revalidate(vma, address) 1

Please replace s/1/true as arch_hugepage_vma_revalidate() returns bool ?

> +#endif

Right, above construct is much better than __HAVE_ARCH_ based one.

> +
>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>  static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct 
> *oldmm)
>  {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 2b219acb528e..cb9a9ddb4d86 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -935,6 +935,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, 
> unsigned long address,
>*/
>   if (expect_anon && (!(*vmap)->anon_vma || !vma_is_anonymous(*vmap)))
>   return SCAN_PAGE_ANON;
> +
> + if (!arch_hugepage_vma_revalidate(vma, address))
> + return SCAN_VMA_CHECK;
> +
>   return SCAN_SUCCEED;
>  }
>  

Otherwise this LGTM.



Re: [PATCH RFC v3 13/35] mm: memory: Introduce fault-on-access mechanism for pages

2024-01-31 Thread Anshuman Khandual
On 1/25/24 22:12, Alexandru Elisei wrote:
> Introduce a mechanism that allows an architecture to trigger a page fault,
> and add the infrastructure to handle that fault accordingly. To use make> use 
> of this, an arch is expected to mark the table entry as PAGE_NONE (which
> will cause a fault next time it is accessed) and to implement an
> arch-specific method (like a software bit) for recognizing that the fault
> needs to be handled by the arch code.
> 
> arm64 will use of this approach to reserve tag storage for pages which are
> mapped in an MTE enabled VMA, but the storage needed to store tags isn't
> reserved (for example, because of an mprotect(PROT_MTE) call on a VMA with
> existing pages).

Just to summerize -

So platform will create NUMA balancing like page faults - via marking existing
mappings with PAGE_NONE permission, when the subsequent fault happens identify
such cases via a software bit in the page table entry and then route the fault
to the platform code itself for special purpose page fault handling where page
might come from some reserved areas instead.

Some questions

- How often PAGE_NONE is to be marked for applicable MTE VMA based mappings 

- Is it periodic like NUMA balancing or just one time for tag storage

- How this is going to interact with NUMA balancing given both use PAGE_NONE

- How to differentiate these mappings from standard pte_protnone()

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch. Split from patch #19 ("mm: mprotect: Introduce 
> PAGE_FAULT_ON_ACCESS
> for mprotect(PROT_MTE)") (David Hildenbrand).
> 
>  include/linux/huge_mm.h |  4 ++--
>  include/linux/pgtable.h | 47 +++--
>  mm/Kconfig  |  3 +++
>  mm/huge_memory.c| 36 +
>  mm/memory.c | 51 ++---
>  5 files changed, 109 insertions(+), 32 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5adb86af35fc..4678a0a5e6a8 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -346,7 +346,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>  struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long 
> addr,
>   pud_t *pud, int flags, struct dev_pagemap **pgmap);
>  
> -vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf);
> +vm_fault_t handle_huge_pmd_protnone(struct vm_fault *vmf);
>  
>  extern struct page *huge_zero_page;
>  extern unsigned long huge_zero_pfn;
> @@ -476,7 +476,7 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
>   return NULL;
>  }
>  
> -static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
> +static inline vm_fault_t handle_huge_pmd_protnone(struct vm_fault *vmf)
>  {
>   return 0;
>  }
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 2d0f04042f62..81a21be855a2 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1455,7 +1455,7 @@ static inline int pud_trans_unstable(pud_t *pud)
>   return 0;
>  }
>  
> -#ifndef CONFIG_NUMA_BALANCING
> +#if !defined(CONFIG_NUMA_BALANCING) && 
> !defined(CONFIG_ARCH_HAS_FAULT_ON_ACCESS)
>  /*
>   * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". It 
> is
>   * perfectly valid to indicate "no" in that case, which is why our default
> @@ -1477,7 +1477,50 @@ static inline int pmd_protnone(pmd_t pmd)
>  {
>   return 0;
>  }
> -#endif /* CONFIG_NUMA_BALANCING */
> +#endif /* !CONFIG_NUMA_BALANCING && !CONFIG_ARCH_HAS_FAULT_ON_ACCESS */
> +
> +#ifndef CONFIG_ARCH_HAS_FAULT_ON_ACCESS
> +static inline bool arch_fault_on_access_pte(pte_t pte)
> +{
> + return false;
> +}
> +
> +static inline bool arch_fault_on_access_pmd(pmd_t pmd)
> +{
> + return false;
> +}
> +
> +/*
> + * The function is called with the fault lock held and an elevated reference 
> on
> + * the folio.
> + *
> + * Rules that an arch implementation of the function must follow:
> + *
> + * 1. The function must return with the elevated reference dropped.
> + *
> + * 2. If the return value contains VM_FAULT_RETRY or VM_FAULT_COMPLETED then:
> + *
> + * - if FAULT_FLAG_RETRY_NOWAIT is not set, the function must return with the
> + *   correct fault lock released, which can be accomplished with
> + *   release_fault_lock(vmf). Note that release_fault_lock() doesn't check if
> + *   FAULT_FLAG_RETRY_NOWAIT is set before releasing the mmap_lock.
> + *
> + * - if FAULT_FLAG_RETRY_NOWAIT is set, then the function must not release 
> the
> + *   mmap_lock. The flag should be set only if the mmap_lock is held.
> + *
> + * 3. If the return value contains neither of the above, the function must 
> not
> + * release the fault lock; the generic fault handler will take care of 
> releasing
> + * the correct lock.
> + */
> +static inline vm_fault_t arch_handle_folio_fault_on_access(struct folio 
> 

Re: [PATCH RFC v3 12/35] mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()

2024-01-31 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> arm64 uses arch_swap_restore() to restore saved tags before the page is
> swapped in and it's called in atomic context (with the ptl lock held).
> 
> Introduce arch_swap_prepare_to_restore() that will allow an architecture to
> perform extra work during swap in and outside of a critical section.
> This will be used by arm64 to allocate a buffer in memory where to
> temporarily save tags if tag storage is not available for the page being
> swapped in.

Just wondering if tag storage will always be unavailable for tagged pages
being swapped in ? OR there are cases where allocation might not even be
required ? This prepare phase needs to be outside the critical section -
only because there might be memory allocations ?



Re: [PATCH RFC v3 09/35] mm: cma: Introduce cma_remove_mem()

2024-01-31 Thread Anshuman Khandual
On 1/30/24 17:03, Alexandru Elisei wrote:
> Hi,
> 
> I really appreciate the feedback you have given me so far. I believe the
> commit message isn't clear enough and there has been a confusion.
> 
> A CMA user adds a CMA area to the cma_areas array with
> cma_declare_contiguous_nid() or cma_init_reserved_mem().
> init_cma_reserved_pageblock() then iterates over the array and activates
> all cma areas.

Agreed.

> 
> The function cma_remove_mem() is intended to be used to remove a cma area
> from the cma_areas array **before** the area has been activated.

Understood.

> 
> Usecase: a driver (in this case, the arm64 dynamic tag storage code)
> manages several cma areas. The driver successfully adds the first area to
> the cma_areas array. When the driver tries to adds the second area, the
> function fails. Without cma_remove_mem(), the driver has no way to prevent
> the first area from being freed to the page allocator. cma_remove_mem() is
> about providing a means to do cleanup in case of error.
> 
> Does that make more sense now?

How to ensure that cma_remove_mem() should get called by the driver before
core_initcall()---> cma_init_reserved_areas()---> cma_activate_area() chain
happens. Else cma_remove_mem() will miss out to clear cma->count and given
area will proceed to get activated like always.


> 
> Ok Tue, Jan 30, 2024 at 11:20:56AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> Memory is added to CMA with cma_declare_contiguous_nid() and
>>> cma_init_reserved_mem(). This memory is then put on the MIGRATE_CMA list in
>>> cma_init_reserved_areas(), where the page allocator can make use of it.
>>
>> cma_declare_contiguous_nid() reserves memory in memblock and marks the
> 
> You forgot about about cma_init_reserved_mem() which does the same thing,
> but yes, you are right.

Agreed, missed that. There are some direct cma_init_reserved_mem() calls as 
well.

> 
>> for subsequent CMA usage, where as cma_init_reserved_areas() activates
>> these memory areas through init_cma_reserved_pageblock(). Standard page
>> allocator only receives these memory via free_reserved_page() - only if
> 
> I don't think that's correct. init_cma_reserved_pageblock() clears the
> PG_reserved page flag, sets the migratetype to MIGRATE_CMA and then frees
> the page. After that, the page is available to the standard page allocator
> to use for allocation. Otherwise, what would be the point of the
> MIGRATE_CMA migratetype?

Understood and agreed.

> 
>> the page block activation fails.
> 
> For the sake of having a complete picture, I'll add that that only happens
> if cma->reserve_pages_on_error is false. If the CMA user sets the field to
> 'true' (with cma_reserve_pages_on_error()), then the pages in the CMA
> region are kept PG_reserved if activation fails.

Why cannot you use cma_reserve_pages_on_error() ?

> 
>>
>>>
>>> If a device manages multiple CMA areas, and there's an error when one of
>>> the areas is added to CMA, there is no mechanism for the device to prevent
>>
>> What kind of error ? init_cma_reserved_pageblock() fails ? But that will
>> not happen until cma_init_reserved_areas().
> 
> I think I haven't been clear enough. When I say that "an area is added
> to CMA", I mean that the memory region is added to cma_areas array, via
> cma_declare_contiguous_nid() or cma_init_reserved_mem(). There are several
> ways in which either function can fail.

Okay.

> 
>>
>>> the rest of the areas, which were added before the error occured, from
>>> being later added to the MIGRATE_CMA list.
>>
>> Why is this mechanism required ? cma_init_reserved_areas() scans over all
>> CMA areas and try and activate each of them sequentially. Why is not this
>> sufficient ?
> 
> This patch is about removing a struct cma from the cma_areas array after it
> has been added to the array, with cma_declare_contiguous_nid() or
> cma_init_reserved_mem(), to prevent the area from being activated in
> cma_init_reserved_areas(). Sorry for the confusion.
> 
> I'll add a check in cma_remove_mem() to fail if the cma area has been
> activated, and a comment to the function to explain its usage.

That will be a good check.

> 
>>
>>>
>>> Add cma_remove_mem() which allows a previously reserved CMA area to be
>>> removed and thus it cannot be used by the page allocator.
>>
>> Successfully activated CMA areas do not get used by the buddy allocator.
> 
> I don't believe that is correct, see above.
Apologies, it's my bad.

> 
>>
>>>
>>> Signed-off-by: Alexandru Elisei 
>

Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-30 Thread Anshuman Khandual



On 1/30/24 17:04, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Jan 30, 2024 at 03:25:20PM +0530, Anshuman Khandual wrote:
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
>>> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
>>> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
>>> set in vma_alloc_zeroed_movable_folio().
>>>
>>> Expand this to be more generic by adding an arch hook that modifes the gfp
>>> flags for an allocation when the VMA is known.
>>>
>>> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
>>> is also set; from that point of view, the current behaviour is unchanged,
>>> even though the arm64 flag is set in more places.  When arm64 will have
>>> support to reuse the tag storage for data allocation, the uses of the
>>> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
>>> to reserve the corresponding tag storage for the pages being allocated.
>> Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
>> down via a new arch call back i.e arch_calc_vma_gfp() while still maintaining
>> (vma->vm_flags & VM_MTE) conditionality improve the current scenario. Because
> I'm afraid I don't follow you.

I was just asking whether the overall scope of __GFP_ZEROTAGS flag is being
increased to cover more core MM paths through this patch. I think you have
already answered that below.

> 
>> the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
>> for any additional stuff.
>>
>> OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was not
>> the case earlier via this call back.
> Before this patch: vma_alloc_zeroed_movable_folio() sets __GFP_ZEROTAGS.
> After this patch: vma_alloc_folio() sets __GFP_ZEROTAGS.

Understood.

> 
> This patch is about adding __GFP_ZEROTAGS for more callers.

Right, I guess that is the real motivation for this patch. But just wondering
does this cover all possible anon fault paths for converting given vma_flag's
VM_MTE flag into page alloc flag __GFP_ZEROTAGS ? Aren't there any other file
besides (mm/shmem.c) which needs to be changed to include arch_calc_vma_gfp() ?



Re: [PATCH RFC v3 08/35] mm: cma: Introduce cma_alloc_range()

2024-01-30 Thread Anshuman Khandual



On 1/30/24 17:05, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Jan 30, 2024 at 10:50:00AM +0530, Anshuman Khandual wrote:
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> Today, cma_alloc() is used to allocate a contiguous memory region. The
>>> function allows the caller to specify the number of pages to allocate, but
>>> not the starting address. cma_alloc() will walk over the entire CMA region
>>> trying to allocate the first available range of the specified size.
>>>
>>> Introduce cma_alloc_range(), which makes CMA more versatile by allowing the
>>> caller to specify a particular range in the CMA region, defined by the
>>> start pfn and the size.
>>>
>>> arm64 will make use of this function when tag storage management will be
>>> implemented: cma_alloc_range() will be used to reserve the tag storage
>>> associated with a tagged page.
>> Basically, you would like to pass on a preferred start address and the
>> allocation could just fail if a contig range is not available from such
>> a starting address ?
>>
>> Then why not just change cma_alloc() to take a new argument 'start_pfn'.
>> Why create a new but almost similar allocator ?
> I tried doing that, and I gave up because:
> 
> - It made cma_alloc() even more complex and hard to follow.
> 
> - What value should 'start_pfn' be to tell cma_alloc() that it should be
>   ignored? Or, to put it another way, what pfn number is invalid on **all**
>   platforms that Linux supports?
> 
> I can give it another go if we can come up with an invalid value for
> 'start_pfn'.

Something negative might work. How about -1/-1UL ? A quick search gives
some instances such as ...

git grep "pfn == -1"

mm/mm_init.c:   if (*start_pfn == -1UL)
mm/vmscan.c:if (pfn == -1)
mm/vmscan.c:if (pfn == -1)
mm/vmscan.c:if (pfn == -1)
tools/testing/selftests/mm/hugepage-vmemmap.c:  if (pfn == -1UL) {

Could not -1UL be abstracted as common macro MM_INVALID_PFN to be used in
such scenarios including here ?

> 
>> But then I am wondering why this could not be done in the arm64 platform
>> code itself operating on a CMA area reserved just for tag storage. Unless
>> this new allocator has other usage beyond MTE, this could be implemented
>> in the platform itself.
> I had the same idea in the previous iteration, David Hildenbrand suggested
> this approach [1].
> 
> [1] 
> https://lore.kernel.org/linux-fsdevel/2aafd53f-af1f-45f3-a08c-d11962254...@redhat.com/

There are two different cma_alloc() proposals here - including the next
patch i.e mm: cma: Fast track allocating memory when the pages are free

1) Augment cma_alloc() or add cma_alloc_range() with start_pfn parameter
2) Speed up cma_alloc() for small allocation requests when pages are free

The second one if separated out from this series could be considered on
its own as it will help all existing cma_alloc() callers. The first one
definitely needs an use case as provided in this series.



Re: [PATCH RFC v3 07/35] mm: cma: Add CMA_RELEASE_{SUCCESS,FAIL} events

2024-01-30 Thread Anshuman Khandual



On 1/29/24 17:23, Alexandru Elisei wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 03:01:24PM +0530, Anshuman Khandual wrote:
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> Similar to the two events that relate to CMA allocations, add the
>>> CMA_RELEASE_SUCCESS and CMA_RELEASE_FAIL events that count when CMA pages
>>> are freed.
>> How is this is going to be beneficial towards analyzing CMA alloc/release
>> behaviour - particularly with respect to this series. OR just adding this
>> from parity perspective with CMA alloc side counters ? Regardless this
>> CMA change too could be discussed separately.
> Added for parity and because it's useful for this series (see my reply to
> the previous patch where I discuss how I've used the counters).

As mentioned earlier, a new CONFIG_CMA_SYSFS element 'cma->nr_freed_pages'
could be instrumented in cma_release()'s success path for this purpose.
But again the failure path is not of much value as it could only happen
when there is an invalid input from the caller i.e when cma_pages_valid()
check fails.



Re: [PATCH RFC v3 06/35] mm: cma: Make CMA_ALLOC_SUCCESS/FAIL count the number of pages

2024-01-30 Thread Anshuman Khandual



On 1/30/24 17:28, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Jan 30, 2024 at 10:22:11AM +0530, Anshuman Khandual wrote:
>>
>> On 1/29/24 17:21, Alexandru Elisei wrote:
>>> Hi,
>>>
>>> On Mon, Jan 29, 2024 at 02:54:20PM +0530, Anshuman Khandual wrote:
>>>>
>>>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>>>> The CMA_ALLOC_SUCCESS, respectively CMA_ALLOC_FAIL, are increased by one
>>>>> after each cma_alloc() function call. This is done even though cma_alloc()
>>>>> can allocate an arbitrary number of CMA pages. When looking at
>>>>> /proc/vmstat, the number of successful (or failed) cma_alloc() calls
>>>>> doesn't tell much with regards to how many CMA pages were allocated via
>>>>> cma_alloc() versus via the page allocator (regular allocation request or
>>>>> PCP lists refill).
>>>>>
>>>>> This can also be rather confusing to a user who isn't familiar with the
>>>>> code, since the unit of measurement for nr_free_cma is the number of 
>>>>> pages,
>>>>> but cma_alloc_success and cma_alloc_fail count the number of cma_alloc()
>>>>> function calls.
>>>>>
>>>>> Let's make this consistent, and arguably more useful, by having
>>>>> CMA_ALLOC_SUCCESS count the number of successfully allocated CMA pages, 
>>>>> and
>>>>> CMA_ALLOC_FAIL count the number of pages the cma_alloc() failed to
>>>>> allocate.
>>>>>
>>>>> For users that wish to track the number of cma_alloc() calls, there are
>>>>> tracepoints for that already implemented.
>>>>>
>>>>> Signed-off-by: Alexandru Elisei 
>>>>> ---
>>>>>  mm/cma.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/cma.c b/mm/cma.c
>>>>> index f49c95f8ee37..dbf7fe8cb1bd 100644
>>>>> --- a/mm/cma.c
>>>>> +++ b/mm/cma.c
>>>>> @@ -517,10 +517,10 @@ struct page *cma_alloc(struct cma *cma, unsigned 
>>>>> long count,
>>>>>   pr_debug("%s(): returned %p\n", __func__, page);
>>>>>  out:
>>>>>   if (page) {
>>>>> - count_vm_event(CMA_ALLOC_SUCCESS);
>>>>> + count_vm_events(CMA_ALLOC_SUCCESS, count);
>>>>>   cma_sysfs_account_success_pages(cma, count);
>>>>>   } else {
>>>>> - count_vm_event(CMA_ALLOC_FAIL);
>>>>> + count_vm_events(CMA_ALLOC_FAIL, count);
>>>>>   if (cma)
>>>>>   cma_sysfs_account_fail_pages(cma, count);
>>>>>   }
>>>> Without getting into the merits of this patch - which is actually trying 
>>>> to do
>>>> semantics change to /proc/vmstat, wondering how is this even related to 
>>>> this
>>>> particular series ? If required this could be debated on it's on 
>>>> separately.
>>> Having the number of CMA pages allocated and the number of CMA pages freed
>>> allows someone to infer how many tagged pages are in use at a given time:
>> That should not be done in CMA which is a generic multi purpose allocator.

> Ah, ok. Let me rephrase that: Having the number of CMA pages allocated, the
> number of failed CMA page allocations and the number of freed CMA pages
> allows someone to infer how many CMA pages are in use at a given time.
> That's valuable information for software designers and system
> administrators, as it allows them to tune the number of CMA pages available
> in a system.
> 
> Or put another way: what would you consider to be more useful?  Knowing the
> number of cma_alloc()/cma_release() calls, or knowing the number of pages
> that cma_alloc()/cma_release() allocated or freed?

There is still value in knowing how many times cma_alloc() succeeded or failed
regardless of the cumulative number pages involved over the time. Actually the
count helps to understand how cma_alloc() performed overall as an allocator.

But on the cma_release() path there is no chances of failure apart from - just
when the caller itself provides an wrong input. So there are no corresponding
CMA_RELEASE_SUCCESS/CMA_RELEASE_FAIL vmstat counters in there - for a reason !

Coming back to CMA based pages being allocated and freed, there is already an
interface via sysfs (CONFIG_CMA_SYSFS) which gets updated in cma_alloc() path
via cma_sysfs_account_success_pages() and cma_sysfs_account_fail_pages().

#ls /sys/kernel/mm/cma/
alloc_pages_fail alloc_pages_success

Why these counters could not meet your requirements ? Also 'struct cma' can
be updated to add an element 'nr_pages_freed' to be tracked in cma_release(),
providing free pages count as well.

There are additional debug fs based elements (CONFIG_CMA_DEBUGFS) available.

#ls /sys/kernel/debug/cma/
alloc  base_pfn  bitmap  count  free  maxchunk  order_per_bit  used



Re: [PATCH RFC v3 04/35] mm: page_alloc: Partially revert "mm: page_alloc: remove stale CMA guard code"

2024-01-30 Thread Anshuman Khandual



On 1/30/24 17:27, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Jan 30, 2024 at 10:04:02AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 1/29/24 17:16, Alexandru Elisei wrote:
>>> Hi,
>>>
>>> On Mon, Jan 29, 2024 at 02:31:23PM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>>>> The patch f945116e4e19 ("mm: page_alloc: remove stale CMA guard code")
>>>>> removed the CMA filter when allocating from the MIGRATE_MOVABLE pcp list
>>>>> because CMA is always allowed when __GFP_MOVABLE is set.
>>>>>
>>>>> With the introduction of the arch_alloc_cma() function, the above is not
>>>>> true anymore, so bring back the filter.
>>>>
>>>> This makes sense as arch_alloc_cma() now might prevent ALLOC_CMA being
>>>> assigned to alloc_flags in gfp_to_alloc_flags_cma().
>>>
>>> Can I add your Reviewed-by tag then?
>>
>> I think all these changes need to be reviewed in their entirety
>> even though some patches do look good on their own. For example
>> this patch depends on whether [PATCH 03/35] is acceptable or not.
>>
>> I would suggest separating out CMA patches which could be debated
>> and merged regardless of this series.
> 
> Ah, I see, makes sense. Since basically all the core mm changes are there
> to enable dynamic tag storage for arm64, I'll hold on until the series
> stabilises before separating the core mm from the arm64 patches.

Fair enough but at least could you please separate out this particular
patch right away and send across. 

mm: cma: Don't append newline when generating CMA area name



Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-30 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> set in vma_alloc_zeroed_movable_folio().
> 
> Expand this to be more generic by adding an arch hook that modifes the gfp
> flags for an allocation when the VMA is known.
> 
> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> is also set; from that point of view, the current behaviour is unchanged,
> even though the arm64 flag is set in more places.  When arm64 will have
> support to reuse the tag storage for data allocation, the uses of the
> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> to reserve the corresponding tag storage for the pages being allocated.

Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
down via a new arch call back i.e arch_calc_vma_gfp() while still maintaining
(vma->vm_flags & VM_MTE) conditionality improve the current scenario. Because
the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
for any additional stuff.

OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was not
the case earlier via this call back.

> 
> The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> the caller; this has been done to keep an architecture from modifying the
> flags already set by the core memory management code; this is similar to
> how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> implemented. This can be revisited in the future if there's a need to do
> so.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  arch/arm64/include/asm/page.h|  5 ++---
>  arch/arm64/include/asm/pgtable.h |  3 +++
>  arch/arm64/mm/fault.c| 19 ++-
>  include/linux/pgtable.h  |  7 +++
>  mm/mempolicy.c   |  1 +
>  mm/shmem.c   |  5 -
>  6 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 2312e6ee595f..88bab032a493 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -29,9 +29,8 @@ void copy_user_highpage(struct page *to, struct page *from,
>  void copy_highpage(struct page *to, struct page *from);
>  #define __HAVE_ARCH_COPY_HIGHPAGE
>  
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr);
> -#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
>  
>  void tag_clear_highpage(struct page *to);
>  #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 79ce70fbb751..08f0904dbfc2 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1071,6 +1071,9 @@ static inline void arch_swap_restore(swp_entry_t entry, 
> struct folio *folio)
>  
>  #endif /* CONFIG_ARM64_MTE */
>  
> +#define __HAVE_ARCH_CALC_VMA_GFP
> +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp);
> +
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 55f6455a8284..4d3f0a870ad8 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -937,22 +937,15 @@ void do_debug_exception(unsigned long 
> addr_if_watchpoint, unsigned long esr,
>  NOKPROBE_SYMBOL(do_debug_exception);
>  
>  /*
> - * Used during anonymous page fault handling.
> + * If this is called during anonymous page fault handling, and the page is
> + * mapped with PROT_MTE, initialise the tags at the point of tag zeroing as 
> this
> + * is usually faster than separate DC ZVA and STGM.
>   */
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr)
> +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp)
>  {
> - gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO;
> -
> - /*
> -  * If the page is mapped with PROT_MTE, initialise the tags at the
> -  * point of allocation and page zeroing as this is usually faster than
> -  * separate DC ZVA and STGM.
> -  */
>   if (vma->vm_flags & VM_MTE)
> - flags |= __GFP_ZEROTAGS;
> -
> - return vma_alloc_folio(flags, 0, vma, vaddr, false);
> + return __GFP_ZEROTAGS;
> + return 0;
>  }
>  
>  void tag_clear_highpage(struct page *page)
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5ddec6b5305..98f81ca08cbe 100644
> --- a/include/linux/pgtable.h
> +++ 

Re: [PATCH RFC v3 10/35] mm: cma: Fast track allocating memory when the pages are free

2024-01-30 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> If the pages to be allocated are free, take them directly off the buddy
> allocator, instead of going through alloc_contig_range() and avoiding
> costly calls to lru_cache_disable().
> 
> Only allocations of the same size as the CMA region order are considered,
> to avoid taking the zone spinlock for too long.
> 
> Signed-off-by: Alexandru Elisei 

This patch seems to be improving standard cma_alloc() as well as
the previously added new allocator i.e cma_alloc_range() - via a
new helper cma_alloc_pages_fastpath().

Should not any standard cma_alloc() improvement be discussed as
an independent patch separately irrespective of this series. OR
it is some how related to this series which I might be missing ?

> ---
> 
> Changes since rfc v2:
> 
> * New patch. Reworked from the rfc v2 patch #26 ("arm64: mte: Fast track
> reserving tag storage when the block is free") (David Hildenbrand).
> 
>  include/linux/page-flags.h | 15 --
>  mm/Kconfig |  5 +
>  mm/cma.c   | 42 ++
>  mm/memory-failure.c|  8 
>  mm/page_alloc.c| 23 -
>  5 files changed, 73 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 735cddc13d20..b7237bce7446 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -575,11 +575,22 @@ TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
>  #define MAGIC_HWPOISON   0x48575053U /* HWPS */
>  extern void SetPageHWPoisonTakenOff(struct page *page);
>  extern void ClearPageHWPoisonTakenOff(struct page *page);
> -extern bool take_page_off_buddy(struct page *page);
> -extern bool put_page_back_buddy(struct page *page);
> +extern bool PageHWPoisonTakenOff(struct page *page);
>  #else
>  PAGEFLAG_FALSE(HWPoison, hwpoison)
> +TESTSCFLAG_FALSE(HWPoison, hwpoison)
>  #define __PG_HWPOISON 0
> +static inline void SetPageHWPoisonTakenOff(struct page *page) { }
> +static inline void ClearPageHWPoisonTakenOff(struct page *page) { }
> +static inline bool PageHWPoisonTakenOff(struct page *page)
> +{
> +  return false;
> +}
> +#endif
> +
> +#ifdef CONFIG_WANTS_TAKE_PAGE_OFF_BUDDY
> +extern bool take_page_off_buddy(struct page *page, bool poison);
> +extern bool put_page_back_buddy(struct page *page, bool unpoison);
>  #endif
>  
>  #if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ffc3a2ba3a8c..341cf53898db 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -745,12 +745,16 @@ config DEFAULT_MMAP_MIN_ADDR
>  config ARCH_SUPPORTS_MEMORY_FAILURE
>   bool
>  
> +config WANTS_TAKE_PAGE_OFF_BUDDY
> + bool> +
>  config MEMORY_FAILURE
>   depends on MMU
>   depends on ARCH_SUPPORTS_MEMORY_FAILURE
>   bool "Enable recovery from hardware memory errors"
>   select MEMORY_ISOLATION
>   select RAS
> + select WANTS_TAKE_PAGE_OFF_BUDDY
>   help
> Enables code to recover from some memory failures on systems
> with MCA recovery. This allows a system to continue running
> @@ -891,6 +895,7 @@ config CMA
>   depends on MMU
>   select MIGRATION
>   select MEMORY_ISOLATION
> + select WANTS_TAKE_PAGE_OFF_BUDDY
>   help
> This enables the Contiguous Memory Allocator which allows other
> subsystems to allocate big physically-contiguous blocks of memory.
> diff --git a/mm/cma.c b/mm/cma.c
> index 2881bab12b01..15663f95d77b 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -444,6 +444,34 @@ static void cma_debug_show_areas(struct cma *cma)
>  static inline void cma_debug_show_areas(struct cma *cma) { }
>  #endif
>  
> +/* Called with the cma mutex held. */
> +static int cma_alloc_pages_fastpath(struct cma *cma, unsigned long start,
> + unsigned long end)
> +{
> + bool success = false;
> + unsigned long i, j;
> +
> + /* Avoid contention on the zone lock. */
> + if (start - end != 1 << cma->order_per_bit)
> + return -EINVAL;
> +
> + for (i = start; i < end; i++) {
> + if (!is_free_buddy_page(pfn_to_page(i)))
> + break;
> + success = take_page_off_buddy(pfn_to_page(i), false);
> + if (!success)
> + break;
> + }
> +
> + if (success)
> + return 0;
> +
> + for (j = start; j < i; j++)
> + put_page_back_buddy(pfn_to_page(j), false);
> +
> + return -EBUSY;
> +}
> +
>  /**
>   * cma_alloc_range() - allocate pages in a specific range
>   * @cma:   Contiguous memory region for which the allocation is performed.
> @@ -493,7 +521,11 @@ int cma_alloc_range(struct cma *cma, unsigned long 
> start, unsigned long count,
>  
>   for (i = 0; i < tries; i++) {
>   mutex_lock(_mutex);
> - err = alloc_contig_range(start, start + count, MIGRATE_CMA, 
> gfp);
> +   

Re: [PATCH RFC v3 09/35] mm: cma: Introduce cma_remove_mem()

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> Memory is added to CMA with cma_declare_contiguous_nid() and
> cma_init_reserved_mem(). This memory is then put on the MIGRATE_CMA list in
> cma_init_reserved_areas(), where the page allocator can make use of it.

cma_declare_contiguous_nid() reserves memory in memblock and marks the
for subsequent CMA usage, where as cma_init_reserved_areas() activates
these memory areas through init_cma_reserved_pageblock(). Standard page
allocator only receives these memory via free_reserved_page() - only if
the page block activation fails.

> 
> If a device manages multiple CMA areas, and there's an error when one of
> the areas is added to CMA, there is no mechanism for the device to prevent

What kind of error ? init_cma_reserved_pageblock() fails ? But that will
not happen until cma_init_reserved_areas().

> the rest of the areas, which were added before the error occured, from
> being later added to the MIGRATE_CMA list.

Why is this mechanism required ? cma_init_reserved_areas() scans over all
CMA areas and try and activate each of them sequentially. Why is not this
sufficient ?

> 
> Add cma_remove_mem() which allows a previously reserved CMA area to be
> removed and thus it cannot be used by the page allocator.

Successfully activated CMA areas do not get used by the buddy allocator.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch.
> 
>  include/linux/cma.h |  1 +
>  mm/cma.c| 30 +-
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index e32559da6942..787cbec1702e 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -48,6 +48,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
> phys_addr_t size,
>   unsigned int order_per_bit,
>   const char *name,
>   struct cma **res_cma);
> +extern void cma_remove_mem(struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned 
> int align,
> bool no_warn);
>  extern int cma_alloc_range(struct cma *cma, unsigned long start, unsigned 
> long count,
> diff --git a/mm/cma.c b/mm/cma.c
> index 4a0f68b9443b..2881bab12b01 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -147,8 +147,12 @@ static int __init cma_init_reserved_areas(void)
>  {
>   int i;
>  
> - for (i = 0; i < cma_area_count; i++)
> + for (i = 0; i < cma_area_count; i++) {
> + /* Region was removed. */
> + if (!cma_areas[i].count)
> + continue;

Skip previously added CMA area (now zeroed out) ?

>   cma_activate_area(_areas[i]);
> + }
>  
>   return 0;
>  }

cma_init_reserved_areas() gets called via core_initcall(). Some how
platform/device needs to call cma_remove_mem() before core_initcall()
gets called ? This might be time sensitive.

> @@ -216,6 +220,30 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
> phys_addr_t size,
>   return 0;
>  }
>  
> +/**
> + * cma_remove_mem() - remove cma area
> + * @res_cma: Pointer to the cma region.
> + *
> + * This function removes a cma region created with cma_init_reserved_mem(). 
> The
> + * ->count is set to 0.
> + */
> +void __init cma_remove_mem(struct cma **res_cma)
> +{
> + struct cma *cma;
> +
> + if (WARN_ON_ONCE(!res_cma || !(*res_cma)))
> + return;
> +
> + cma = *res_cma;
> + if (WARN_ON_ONCE(!cma->count))
> + return;
> +
> + totalcma_pages -= cma->count;
> + cma->count = 0;
> +
> + *res_cma = NULL;
> +}
> +
>  /**
>   * cma_declare_contiguous_nid() - reserve custom contiguous area
>   * @base: Base address of the reserved area optional, use 0 for any

But first please do explain what are the errors device or platform might
see on a previously marked CMA area so that removing them on way becomes
necessary preventing their activation via cma_init_reserved_areas().



Re: [PATCH RFC v3 08/35] mm: cma: Introduce cma_alloc_range()

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> Today, cma_alloc() is used to allocate a contiguous memory region. The
> function allows the caller to specify the number of pages to allocate, but
> not the starting address. cma_alloc() will walk over the entire CMA region
> trying to allocate the first available range of the specified size.
> 
> Introduce cma_alloc_range(), which makes CMA more versatile by allowing the
> caller to specify a particular range in the CMA region, defined by the
> start pfn and the size.
> 
> arm64 will make use of this function when tag storage management will be
> implemented: cma_alloc_range() will be used to reserve the tag storage
> associated with a tagged page.

Basically, you would like to pass on a preferred start address and the
allocation could just fail if a contig range is not available from such
a starting address ?

Then why not just change cma_alloc() to take a new argument 'start_pfn'.
Why create a new but almost similar allocator ?

But then I am wondering why this could not be done in the arm64 platform
code itself operating on a CMA area reserved just for tag storage. Unless
this new allocator has other usage beyond MTE, this could be implemented
in the platform itself.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch.
> 
>  include/linux/cma.h|  2 +
>  include/trace/events/cma.h | 59 ++
>  mm/cma.c   | 86 ++
>  3 files changed, 147 insertions(+)
> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 63873b93deaa..e32559da6942 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -50,6 +50,8 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
> phys_addr_t size,
>   struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned 
> int align,
> bool no_warn);
> +extern int cma_alloc_range(struct cma *cma, unsigned long start, unsigned 
> long count,
> +unsigned tries, gfp_t gfp);
>  extern bool cma_pages_valid(struct cma *cma, const struct page *pages, 
> unsigned long count);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned 
> long count);
>  
> diff --git a/include/trace/events/cma.h b/include/trace/events/cma.h
> index 25103e67737c..a89af313a572 100644
> --- a/include/trace/events/cma.h
> +++ b/include/trace/events/cma.h
> @@ -36,6 +36,65 @@ TRACE_EVENT(cma_release,
> __entry->count)
>  );
>  
> +TRACE_EVENT(cma_alloc_range_start,
> +
> + TP_PROTO(const char *name, unsigned long start, unsigned long count,
> +  unsigned tries),
> +
> + TP_ARGS(name, start, count, tries),
> +
> + TP_STRUCT__entry(
> + __string(name, name)
> + __field(unsigned long, start)
> + __field(unsigned long, count)
> + __field(unsigned, tries)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, name);
> + __entry->start = start;
> + __entry->count = count;
> + __entry->tries = tries;
> + ),
> +
> + TP_printk("name=%s start=%lx count=%lu tries=%u",
> +   __get_str(name),
> +   __entry->start,
> +   __entry->count,
> +   __entry->tries)
> +);
> +
> +TRACE_EVENT(cma_alloc_range_finish,
> +
> + TP_PROTO(const char *name, unsigned long start, unsigned long count,
> +  unsigned attempts, int err),
> +
> + TP_ARGS(name, start, count, attempts, err),
> +
> + TP_STRUCT__entry(
> + __string(name, name)
> + __field(unsigned long, start)
> + __field(unsigned long, count)
> + __field(unsigned, attempts)
> + __field(int, err)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, name);
> + __entry->start = start;
> + __entry->count = count;
> + __entry->attempts = attempts;
> + __entry->err = err;
> + ),
> +
> + TP_printk("name=%s start=%lx count=%lu attempts=%u err=%d",
> +   __get_str(name),
> +   __entry->start,
> +   __entry->count,
> +   __entry->attempts,
> +   __entry->err)
> +);
> +
>  TRACE_EVENT(cma_alloc_start,
>  
>   TP_PROTO(const char *name, unsigned long count, unsigned int align),
> diff --git a/mm/cma.c b/mm/cma.c
> index 543bb6b3be8e..4a0f68b9443b 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -416,6 +416,92 @@ static void cma_debug_show_areas(struct cma *cma)
>  static inline void cma_debug_show_areas(struct cma *cma) { }
>  #endif
>  
> +/**
> + * cma_alloc_range() - allocate pages in a specific range
> + * @cma:   Contiguous memory region for which the allocation is performed.
> + * @start: Starting pfn of the allocation.
> + * @count: Requested number 

Re: [PATCH RFC v3 06/35] mm: cma: Make CMA_ALLOC_SUCCESS/FAIL count the number of pages

2024-01-29 Thread Anshuman Khandual



On 1/29/24 17:21, Alexandru Elisei wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 02:54:20PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> The CMA_ALLOC_SUCCESS, respectively CMA_ALLOC_FAIL, are increased by one
>>> after each cma_alloc() function call. This is done even though cma_alloc()
>>> can allocate an arbitrary number of CMA pages. When looking at
>>> /proc/vmstat, the number of successful (or failed) cma_alloc() calls
>>> doesn't tell much with regards to how many CMA pages were allocated via
>>> cma_alloc() versus via the page allocator (regular allocation request or
>>> PCP lists refill).
>>>
>>> This can also be rather confusing to a user who isn't familiar with the
>>> code, since the unit of measurement for nr_free_cma is the number of pages,
>>> but cma_alloc_success and cma_alloc_fail count the number of cma_alloc()
>>> function calls.
>>>
>>> Let's make this consistent, and arguably more useful, by having
>>> CMA_ALLOC_SUCCESS count the number of successfully allocated CMA pages, and
>>> CMA_ALLOC_FAIL count the number of pages the cma_alloc() failed to
>>> allocate.
>>>
>>> For users that wish to track the number of cma_alloc() calls, there are
>>> tracepoints for that already implemented.
>>>
>>> Signed-off-by: Alexandru Elisei 
>>> ---
>>>  mm/cma.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/cma.c b/mm/cma.c
>>> index f49c95f8ee37..dbf7fe8cb1bd 100644
>>> --- a/mm/cma.c
>>> +++ b/mm/cma.c
>>> @@ -517,10 +517,10 @@ struct page *cma_alloc(struct cma *cma, unsigned long 
>>> count,
>>> pr_debug("%s(): returned %p\n", __func__, page);
>>>  out:
>>> if (page) {
>>> -   count_vm_event(CMA_ALLOC_SUCCESS);
>>> +   count_vm_events(CMA_ALLOC_SUCCESS, count);
>>> cma_sysfs_account_success_pages(cma, count);
>>> } else {
>>> -   count_vm_event(CMA_ALLOC_FAIL);
>>> +   count_vm_events(CMA_ALLOC_FAIL, count);
>>> if (cma)
>>> cma_sysfs_account_fail_pages(cma, count);
>>> }
>>
>> Without getting into the merits of this patch - which is actually trying to 
>> do
>> semantics change to /proc/vmstat, wondering how is this even related to this
>> particular series ? If required this could be debated on it's on separately.
> 
> Having the number of CMA pages allocated and the number of CMA pages freed
> allows someone to infer how many tagged pages are in use at a given time:

That should not be done in CMA which is a generic multi purpose allocator.

> (allocated CMA pages - CMA pages allocated by drivers* - CMA pages
> released) * 32. That is valuable information for software and hardware
> designers.
> 
> Besides that, for every iteration of the series, this has proven invaluable
> for discovering bugs with freeing and/or reserving tag storage pages.

I am afraid that might not be enough justification for getting something
merged mainline.

> 
> *that would require userspace reading cma_alloc_success and
> cma_release_success before any tagged allocations are performed.

While assuming that no other non-memory-tagged CMA based allocation amd free
call happens in the meantime ? That would be on real thin ice.

I suppose arm64 tagged memory specific allocation or free related counters
need to be created on the caller side, including arch_free_pages_prepare().



Re: [PATCH RFC v3 04/35] mm: page_alloc: Partially revert "mm: page_alloc: remove stale CMA guard code"

2024-01-29 Thread Anshuman Khandual



On 1/29/24 17:16, Alexandru Elisei wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 02:31:23PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> The patch f945116e4e19 ("mm: page_alloc: remove stale CMA guard code")
>>> removed the CMA filter when allocating from the MIGRATE_MOVABLE pcp list
>>> because CMA is always allowed when __GFP_MOVABLE is set.
>>>
>>> With the introduction of the arch_alloc_cma() function, the above is not
>>> true anymore, so bring back the filter.
>>
>> This makes sense as arch_alloc_cma() now might prevent ALLOC_CMA being
>> assigned to alloc_flags in gfp_to_alloc_flags_cma().
> 
> Can I add your Reviewed-by tag then?

I think all these changes need to be reviewed in their entirety
even though some patches do look good on their own. For example
this patch depends on whether [PATCH 03/35] is acceptable or not.

I would suggest separating out CMA patches which could be debated
and merged regardless of this series.



Re: [PATCH RFC v3 01/35] mm: page_alloc: Add gfp_flags parameter to arch_alloc_page()

2024-01-29 Thread Anshuman Khandual



On 1/29/24 17:11, Alexandru Elisei wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 11:18:59AM +0530, Anshuman Khandual wrote:
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> Extend the usefulness of arch_alloc_page() by adding the gfp_flags
>>> parameter.
>> Although the change here is harmless in itself, it will definitely benefit
>> from some additional context explaining the rationale, taking into account
>> why-how arch_alloc_page() got added particularly for s390 platform and how
>> it's going to be used in the present proposal.
> arm64 will use it to reserve tag storage if the caller requested a tagged
> page. Right now that means that __GFP_ZEROTAGS is set in the gfp mask, but
> I'll rename it to __GFP_TAGGED in patch #18 ("arm64: mte: Rename
> __GFP_ZEROTAGS to __GFP_TAGGED") [1].
> 
> [1] 
> https://lore.kernel.org/lkml/20240125164256.4147-19-alexandru.eli...@arm.com/

Makes sense, but please do update the commit message explaining how
new gfp mask argument will be used to detect tagged page allocation
requests, further requiring tag storage allocation.



Re: [PATCH RFC v3 07/35] mm: cma: Add CMA_RELEASE_{SUCCESS,FAIL} events

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> Similar to the two events that relate to CMA allocations, add the
> CMA_RELEASE_SUCCESS and CMA_RELEASE_FAIL events that count when CMA pages
> are freed.

How is this is going to be beneficial towards analyzing CMA alloc/release
behaviour - particularly with respect to this series. OR just adding this
from parity perspective with CMA alloc side counters ? Regardless this
CMA change too could be discussed separately.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch.
> 
>  include/linux/vm_event_item.h | 2 ++
>  mm/cma.c  | 6 +-
>  mm/vmstat.c   | 2 ++
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 747943bc8cc2..aba5c5bf8127 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -83,6 +83,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  #ifdef CONFIG_CMA
>   CMA_ALLOC_SUCCESS,
>   CMA_ALLOC_FAIL,
> + CMA_RELEASE_SUCCESS,
> + CMA_RELEASE_FAIL,
>  #endif
>   UNEVICTABLE_PGCULLED,   /* culled to noreclaim list */
>   UNEVICTABLE_PGSCANNED,  /* scanned for reclaimability */
> diff --git a/mm/cma.c b/mm/cma.c
> index dbf7fe8cb1bd..543bb6b3be8e 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -562,8 +562,10 @@ bool cma_release(struct cma *cma, const struct page 
> *pages,
>  {
>   unsigned long pfn;
>  
> - if (!cma_pages_valid(cma, pages, count))
> + if (!cma_pages_valid(cma, pages, count)) {
> + count_vm_events(CMA_RELEASE_FAIL, count);
>   return false;
> + }
>  
>   pr_debug("%s(page %p, count %lu)\n", __func__, (void *)pages, count);
>  
> @@ -575,6 +577,8 @@ bool cma_release(struct cma *cma, const struct page 
> *pages,
>   cma_clear_bitmap(cma, pfn, count);
>   trace_cma_release(cma->name, pfn, pages, count);
>  
> + count_vm_events(CMA_RELEASE_SUCCESS, count);
> +
>   return true;
>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index db79935e4a54..eebfd5c6c723 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1340,6 +1340,8 @@ const char * const vmstat_text[] = {
>  #ifdef CONFIG_CMA
>   "cma_alloc_success",
>   "cma_alloc_fail",
> + "cma_release_success",
> + "cma_release_fail",
>  #endif
>   "unevictable_pgs_culled",
>   "unevictable_pgs_scanned",



Re: [PATCH RFC v3 06/35] mm: cma: Make CMA_ALLOC_SUCCESS/FAIL count the number of pages

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> The CMA_ALLOC_SUCCESS, respectively CMA_ALLOC_FAIL, are increased by one
> after each cma_alloc() function call. This is done even though cma_alloc()
> can allocate an arbitrary number of CMA pages. When looking at
> /proc/vmstat, the number of successful (or failed) cma_alloc() calls
> doesn't tell much with regards to how many CMA pages were allocated via
> cma_alloc() versus via the page allocator (regular allocation request or
> PCP lists refill).
> 
> This can also be rather confusing to a user who isn't familiar with the
> code, since the unit of measurement for nr_free_cma is the number of pages,
> but cma_alloc_success and cma_alloc_fail count the number of cma_alloc()
> function calls.
> 
> Let's make this consistent, and arguably more useful, by having
> CMA_ALLOC_SUCCESS count the number of successfully allocated CMA pages, and
> CMA_ALLOC_FAIL count the number of pages the cma_alloc() failed to
> allocate.
> 
> For users that wish to track the number of cma_alloc() calls, there are
> tracepoints for that already implemented.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  mm/cma.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index f49c95f8ee37..dbf7fe8cb1bd 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -517,10 +517,10 @@ struct page *cma_alloc(struct cma *cma, unsigned long 
> count,
>   pr_debug("%s(): returned %p\n", __func__, page);
>  out:
>   if (page) {
> - count_vm_event(CMA_ALLOC_SUCCESS);
> + count_vm_events(CMA_ALLOC_SUCCESS, count);
>   cma_sysfs_account_success_pages(cma, count);
>   } else {
> - count_vm_event(CMA_ALLOC_FAIL);
> + count_vm_events(CMA_ALLOC_FAIL, count);
>   if (cma)
>   cma_sysfs_account_fail_pages(cma, count);
>   }

Without getting into the merits of this patch - which is actually trying to do
semantics change to /proc/vmstat, wondering how is this even related to this
particular series ? If required this could be debated on it's on separately.



Re: [PATCH RFC v3 05/35] mm: cma: Don't append newline when generating CMA area name

2024-01-29 Thread Anshuman Khandual


On 1/25/24 22:12, Alexandru Elisei wrote:
> cma->name is displayed in several CMA messages. When the name is generated
> by the CMA code, don't append a newline to avoid breaking the text across
> two lines.

An example of such mis-formatted CMA output from dmesg could be added
here in the commit message to demonstrate the problem better.

> 
> Signed-off-by: Alexandru Elisei 
> ---

Regardless, LGTM.

Reviewed-by: Anshuman Khandual 

> 
> Changes since rfc v2:
> 
> * New patch. This is a fix, and can be merged independently of the other
> patches.

Right, need not be part of this series. Hence please send it separately to
the MM list.

> 
>  mm/cma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 7c09c47e530b..f49c95f8ee37 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -204,7 +204,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
> phys_addr_t size,
>   if (name)
>   snprintf(cma->name, CMA_MAX_NAME, name);
>   else
> - snprintf(cma->name, CMA_MAX_NAME,  "cma%d\n", cma_area_count);
> + snprintf(cma->name, CMA_MAX_NAME,  "cma%d", cma_area_count);
>  
>   cma->base_pfn = PFN_DOWN(base);
>   cma->count = size >> PAGE_SHIFT;



Re: [PATCH RFC v3 04/35] mm: page_alloc: Partially revert "mm: page_alloc: remove stale CMA guard code"

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> The patch f945116e4e19 ("mm: page_alloc: remove stale CMA guard code")
> removed the CMA filter when allocating from the MIGRATE_MOVABLE pcp list
> because CMA is always allowed when __GFP_MOVABLE is set.
> 
> With the introduction of the arch_alloc_cma() function, the above is not
> true anymore, so bring back the filter.

This makes sense as arch_alloc_cma() now might prevent ALLOC_CMA being
assigned to alloc_flags in gfp_to_alloc_flags_cma().

> 
> This is a partially revert because the stale comment remains removed.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  mm/page_alloc.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a96d47a6393e..0fa34bcfb1af 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2897,10 +2897,17 @@ struct page *rmqueue(struct zone *preferred_zone,
>   WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>  
>   if (likely(pcp_allowed_order(order))) {
> - page = rmqueue_pcplist(preferred_zone, zone, order,
> -migratetype, alloc_flags);
> - if (likely(page))
> - goto out;
> + /*
> +  * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> +  * we need to skip it when CMA area isn't allowed.
> +  */
> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> + migratetype != MIGRATE_MOVABLE) {
> + page = rmqueue_pcplist(preferred_zone, zone, order,
> + migratetype, alloc_flags);
> + if (likely(page))
> + goto out;
> + }
>   }
>  
>   page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,



Re: [PATCH RFC v3 03/35] mm: page_alloc: Add an arch hook to filter MIGRATE_CMA allocations

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> As an architecture might have specific requirements around the allocation
> of CMA pages, add an arch hook that can disable allocations from
> MIGRATE_CMA, if the allocation was otherwise allowed.
> 
> This will be used by arm64, which will put tag storage pages on the
> MIGRATE_CMA list, and tag storage pages cannot be tagged. The filter will
> be used to deny using MIGRATE_CMA for __GFP_TAGGED allocations.

Just wondering how allocation requests would be blocked for direct
alloc_contig_range() requests ?

> 
> Signed-off-by: Alexandru Elisei 
> ---
>  include/linux/pgtable.h | 7 +++
>  mm/page_alloc.c | 3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 6d98d5fdd697..c5ddec6b5305 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -905,6 +905,13 @@ static inline void arch_do_swap_page(struct mm_struct 
> *mm,
>  static inline void arch_free_pages_prepare(struct page *page, int order) { }
>  #endif
>  
> +#ifndef __HAVE_ARCH_ALLOC_CMA

Same as last patch i.e __HAVE_ARCH_ALLOC_CMA could be avoided via
a direct check on #ifndef arch_alloc_cma instead.

> +static inline bool arch_alloc_cma(gfp_t gfp)
> +{
> + return true;
> +}
> +#endif
> +
>  #ifndef __HAVE_ARCH_UNMAP_ONE
>  /*
>   * Some architectures support metadata associated with a page. When a
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 27282a1c82fe..a96d47a6393e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3157,7 +3157,8 @@ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t 
> gfp_mask,
> unsigned int alloc_flags)
>  {
>  #ifdef CONFIG_CMA
> - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> + if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE &&
> + arch_alloc_cma(gfp_mask))
>   alloc_flags |= ALLOC_CMA;
>  #endif
>   return alloc_flags;



Re: [PATCH RFC v3 02/35] mm: page_alloc: Add an arch hook early in free_pages_prepare()

2024-01-29 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> The arm64 MTE code uses the PG_arch_2 page flag, which it renames to
> PG_mte_tagged, to track if a page has been mapped with tagging enabled.
> That flag is cleared by free_pages_prepare() by doing:
> 
>   page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
> 
> When tag storage management is added, tag storage will be reserved for a
> page if and only if the page is mapped as tagged (the page flag
> PG_mte_tagged is set). When a page is freed, likewise, the code will have
> to look at the the page flags to determine if the page has tag storage
> reserved, which should also be freed.
> 
> For this purpose, add an arch_free_pages_prepare() hook that is called
> before that page flags are cleared. The function arch_free_page() has also
> been considered for this purpose, but it is called after the flags are
> cleared.

arch_free_pages_prepare() makes sense as a prologue to arch_free_page().  

s/arch_free_pages_prepare/arch_free_page_prepare to match similar functions.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * Expanded commit message (David Hildenbrand).
> 
>  include/linux/pgtable.h | 4 
>  mm/page_alloc.c | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index f6d0e3513948..6d98d5fdd697 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -901,6 +901,10 @@ static inline void arch_do_swap_page(struct mm_struct 
> *mm,
>  }
>  #endif
>  
> +#ifndef __HAVE_ARCH_FREE_PAGES_PREPARE

I guess new __HAVE_ARCH_ constructs are not being added lately. Instead
something like '#ifndef arch_free_pages_prepare' might be better suited.

> +static inline void arch_free_pages_prepare(struct page *page, int order) { }
> +#endif
> +
>  #ifndef __HAVE_ARCH_UNMAP_ONE
>  /*
>   * Some architectures support metadata associated with a page. When a
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2c140abe5ee6..27282a1c82fe 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1092,6 +1092,7 @@ static __always_inline bool free_pages_prepare(struct 
> page *page,
>  
>   trace_mm_page_free(page, order);
>   kmsan_free_page(page, order);
> + arch_free_pages_prepare(page, order);
>  
>   if (memcg_kmem_online() && PageMemcgKmem(page))
>   __memcg_kmem_uncharge_page(page, order);



Re: [PATCH RFC v3 01/35] mm: page_alloc: Add gfp_flags parameter to arch_alloc_page()

2024-01-28 Thread Anshuman Khandual


On 1/25/24 22:12, Alexandru Elisei wrote:
> Extend the usefulness of arch_alloc_page() by adding the gfp_flags
> parameter.

Although the change here is harmless in itself, it will definitely benefit
from some additional context explaining the rationale, taking into account
why-how arch_alloc_page() got added particularly for s390 platform and how
it's going to be used in the present proposal.

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch.
> 
>  arch/s390/include/asm/page.h | 2 +-
>  arch/s390/mm/page-states.c   | 2 +-
>  include/linux/gfp.h  | 2 +-
>  mm/page_alloc.c  | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 73b9c3bf377f..859f0958c574 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -163,7 +163,7 @@ static inline int page_reset_referenced(unsigned long 
> addr)
>  
>  struct page;
>  void arch_free_page(struct page *page, int order);
> -void arch_alloc_page(struct page *page, int order);
> +void arch_alloc_page(struct page *page, int order, gfp_t gfp_flags);
>  
>  static inline int devmem_is_allowed(unsigned long pfn)
>  {
> diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c
> index 01f9b39e65f5..b986c8b158e3 100644
> --- a/arch/s390/mm/page-states.c
> +++ b/arch/s390/mm/page-states.c
> @@ -21,7 +21,7 @@ void arch_free_page(struct page *page, int order)
>   __set_page_unused(page_to_virt(page), 1UL << order);
>  }
>  
> -void arch_alloc_page(struct page *page, int order)
> +void arch_alloc_page(struct page *page, int order, gfp_t gfp_flags)
>  {
>   if (!cmma_flag)
>   return;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index de292a007138..9e8aa3d144db 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -172,7 +172,7 @@ static inline struct zonelist *node_zonelist(int nid, 
> gfp_t flags)
>  static inline void arch_free_page(struct page *page, int order) { }
>  #endif
>  #ifndef HAVE_ARCH_ALLOC_PAGE
> -static inline void arch_alloc_page(struct page *page, int order) { }
> +static inline void arch_alloc_page(struct page *page, int order, gfp_t 
> gfp_flags) { }
>  #endif
>  
>  struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 150d4f23b010..2c140abe5ee6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1485,7 +1485,7 @@ inline void post_alloc_hook(struct page *page, unsigned 
> int order,
>   set_page_private(page, 0);
>   set_page_refcounted(page);
>  
> - arch_alloc_page(page, order);
> + arch_alloc_page(page, order, gfp_flags);
>   debug_pagealloc_map_pages(page, 1 << order);
>  
>   /*

Otherwise LGTM.



Re: [PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-18 Thread Anshuman Khandual


On 4/12/21 2:17 PM, David Hildenbrand wrote:
> On 12.04.21 10:06, Anshuman Khandual wrote:
>> + linuxppc-...@lists.ozlabs.org
>> + linux-i...@vger.kernel.org
>>
>> On 4/12/21 9:18 AM, Anshuman Khandual wrote:
>>> pageblock_order must always be less than MAX_ORDER, otherwise it might lead
>>> to an warning during boot. A similar problem got fixed on arm64 platform
>>> with the commit 79cc2ed5a716 ("arm64/mm: Drop THP conditionality from
>>> FORCE_MAX_ZONEORDER"). Assert the above condition before HUGETLB_PAGE_ORDER
>>> gets assigned as pageblock_order. This will help detect the problem earlier
>>> on platforms where HUGETLB_PAGE_SIZE_VARIABLE is enabled.
>>>
>>> Cc: David Hildenbrand 
>>> Cc: Andrew Morton 
>>> Cc: linux...@kvack.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> Changes in V2:
>>>
>>> - Changed WARN_ON() to BUILD_BUG_ON() per David
>>>
>>> Changes in V1:
>>>
>>> https://patchwork.kernel.org/project/linux-mm/patch/1617947717-2424-1-git-send-email-anshuman.khand...@arm.com/
>>>
>>>   mm/page_alloc.c | 11 +--
>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index cfc72873961d..19283bff4bec 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -6875,10 +6875,17 @@ void __init set_pageblock_order(void)
>>>   if (pageblock_order)
>>>   return;
>>>   -    if (HPAGE_SHIFT > PAGE_SHIFT)
>>> +    if (HPAGE_SHIFT > PAGE_SHIFT) {
>>> +    /*
>>> + * pageblock_order must always be less than
>>> + * MAX_ORDER. So does HUGETLB_PAGE_ORDER if
>>> + * that is being assigned here.
>>> + */
>>> +    BUILD_BUG_ON(HUGETLB_PAGE_ORDER >= MAX_ORDER);
>>
>> Unfortunately the build test fails on both the platforms (powerpc and ia64)
>> which subscribe HUGETLB_PAGE_SIZE_VARIABLE and where this check would make
>> sense. I some how overlooked the cross compile build failure that actually
>> detected this problem.
>>
>> But wondering why this assert is not holding true ? and how these platforms
>> do not see the warning during boot (or do they ?) at mm/vmscan.c:1092 like
>> arm64 did.
>>
>> static int __fragmentation_index(unsigned int order, struct contig_page_info 
>> *info)
>> {
>>  unsigned long requested = 1UL << order;
>>
>>  if (WARN_ON_ONCE(order >= MAX_ORDER))
>>  return 0;
>> 
>>
>> Can pageblock_order really exceed MAX_ORDER - 1 ?
> 
> Ehm, for now I was under the impression that such configurations wouldn't 
> exist.
> 
> And originally, HUGETLB_PAGE_SIZE_VARIABLE was introduced to handle hugepage 
> sizes that all *smaller* than MAX_ORDER - 1: See d9c234005227 ("Do not depend 
> on MAX_ORDER when grouping pages by mobility")

Right.

> 
> 
> However, looking into init_cma_reserved_pageblock():
> 
> if (pageblock_order >= MAX_ORDER) {
>     i = pageblock_nr_pages;
>     ...
> }
> 
> 
> But it's kind of weird, isn't it? Let's assume we have MAX_ORDER - 1 
> correspond to 4 MiB and pageblock_order correspond to 8 MiB.
> 
> Sure, we'd be grouping pages in 8 MiB chunks, however, we cannot even 
> allocate 8 MiB chunks via the buddy. So only alloc_contig_range() could 
> really grab them (IOW: gigantic pages).

Right.

> 
> Further, we have code like deferred_free_range(), where we end up calling 
> __free_pages_core()->...->__free_one_page() with pageblock_order. Wouldn't we 
> end up setting the buddy order to something > MAX_ORDER -1 on that path?

Agreed.

> 
> Having pageblock_order > MAX_ORDER feels wrong and looks shaky.
> 
Agreed, definitely does not look right. Lets see what other folks
might have to say on this.

+ Christoph Lameter 


Re: [PATCH -next v2 1/2] mm/debug_vm_pgtable: Move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE

2021-04-18 Thread Anshuman Khandual



On 4/9/21 9:35 AM, Anshuman Khandual wrote:
> 
> On 4/6/21 10:18 AM, Shixin Liu wrote:
>> v1->v2:
>> Modified the commit message.
> 
> Please avoid change log in the commit message, it should be after '---'
> below the SOB statement.
> 
>>
>> The functions {pmd/pud}_set_huge and {pmd/pud}_clear_huge ars not dependent 
>> on THP.
> 
> typo ^ s/ars/are
> 
> Also there is a checkpatch.pl warning.
> 
> WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
> line)
> #10: 
> The functions {pmd/pud}_set_huge and {pmd/pud}_clear_huge ars not dependent 
> on THP.
> 
> total: 0 errors, 1 warnings, 121 lines checked
> 
> As I had mentioned in the earlier version, the commit message should be some
> thing like ..
> 
> 
> The functions {pmd/pud}_set_huge and {pmd/pud}_clear_huge are not dependent
> on THP. Hence move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE.
> 
> 
>>
>> Signed-off-by: Shixin Liu 
>> ---
>>  mm/debug_vm_pgtable.c | 91 +++
>>  1 file changed, 39 insertions(+), 52 deletions(-)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 05efe98a9ac2..d3cf178621d9 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -242,29 +242,6 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
>> pgprot_t prot)
>>  WARN_ON(!pmd_leaf(pmd));
>>  }
>>  
>> -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> -static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
>> prot)
>> -{
>> -pmd_t pmd;
>> -
>> -if (!arch_vmap_pmd_supported(prot))
>> -return;
>> -
>> -pr_debug("Validating PMD huge\n");
>> -/*
>> - * X86 defined pmd_set_huge() verifies that the given
>> - * PMD is not a populated non-leaf entry.
>> - */
>> -WRITE_ONCE(*pmdp, __pmd(0));
>> -WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
>> -WARN_ON(!pmd_clear_huge(pmdp));
>> -pmd = READ_ONCE(*pmdp);
>> -WARN_ON(!pmd_none(pmd));
>> -}
>> -#else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>> -static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
>> prot) { }
>> -#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>> -
>>  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>>  {
>>  pmd_t pmd = pfn_pmd(pfn, prot);
>> @@ -379,30 +356,6 @@ static void __init pud_leaf_tests(unsigned long pfn, 
>> pgprot_t prot)
>>  pud = pud_mkhuge(pud);
>>  WARN_ON(!pud_leaf(pud));
>>  }
>> -
>> -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
>> prot)
>> -{
>> -pud_t pud;
>> -
>> -if (!arch_vmap_pud_supported(prot))
>> -return;
>> -
>> -pr_debug("Validating PUD huge\n");
>> -/*
>> - * X86 defined pud_set_huge() verifies that the given
>> - * PUD is not a populated non-leaf entry.
>> - */
>> -WRITE_ONCE(*pudp, __pud(0));
>> -WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
>> -WARN_ON(!pud_clear_huge(pudp));
>> -pud = READ_ONCE(*pudp);
>> -WARN_ON(!pud_none(pud));
>> -}
>> -#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
>> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
>> prot) { }
>> -#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
>> -
>>  #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>  static void __init pud_basic_tests(struct mm_struct *mm, unsigned long pfn, 
>> int idx) { }
>>  static void __init pud_advanced_tests(struct mm_struct *mm,
>> @@ -412,9 +365,6 @@ static void __init pud_advanced_tests(struct mm_struct 
>> *mm,
>>  {
>>  }
>>  static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
>> prot)
>> -{
>> -}
>>  #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>  #else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>>  static void __init pmd_basic_tests(unsigned long pfn, int idx) { }
>> @@ -433,14 +383,51 @@ static void __init pud_advanced_tests(struct mm_struct 
>> *mm,
>>  }
>>  static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>>  static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> +

[PATCH V2] mm: Define default value for FIRST_USER_ADDRESS

2021-04-15 Thread Anshuman Khandual
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
same code all over. Instead just define a generic default value (i.e 0UL)
for FIRST_USER_ADDRESS and let the platforms override when required. This
makes it much cleaner with reduced code.

The default FIRST_USER_ADDRESS here would be skipped in 
when the given platform overrides its value via .

Cc: Richard Henderson 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Guo Ren 
Cc: Brian Cain 
Cc: Geert Uytterhoeven 
Cc: Michal Simek 
Cc: Thomas Bogendoerfer 
Cc: Ley Foon Tan 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Heiko Carstens 
Cc: Yoshinori Sato 
Cc: "David S. Miller" 
Cc: Jeff Dike 
Cc: Thomas Gleixner 
Cc: Chris Zankel 
Cc: Andrew Morton 
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
This applies on v5.12-rc7 and has been boot tested on arm64 platform.
But has been cross compiled on multiple other platforms.

Changes in V2:

- Dropped ARCH_HAS_FIRST_USER_ADDRESS construct

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1618368899-20311-1-git-send-email-anshuman.khand...@arm.com/

 arch/alpha/include/asm/pgtable.h | 1 -
 arch/arc/include/asm/pgtable.h   | 6 --
 arch/arm64/include/asm/pgtable.h | 2 --
 arch/csky/include/asm/pgtable.h  | 1 -
 arch/hexagon/include/asm/pgtable.h   | 3 ---
 arch/ia64/include/asm/pgtable.h  | 1 -
 arch/m68k/include/asm/pgtable_mm.h   | 1 -
 arch/microblaze/include/asm/pgtable.h| 2 --
 arch/mips/include/asm/pgtable-32.h   | 1 -
 arch/mips/include/asm/pgtable-64.h   | 1 -
 arch/nios2/include/asm/pgtable.h | 2 --
 arch/openrisc/include/asm/pgtable.h  | 1 -
 arch/parisc/include/asm/pgtable.h| 2 --
 arch/powerpc/include/asm/book3s/pgtable.h| 1 -
 arch/powerpc/include/asm/nohash/32/pgtable.h | 1 -
 arch/powerpc/include/asm/nohash/64/pgtable.h | 2 --
 arch/riscv/include/asm/pgtable.h | 2 --
 arch/s390/include/asm/pgtable.h  | 2 --
 arch/sh/include/asm/pgtable.h| 2 --
 arch/sparc/include/asm/pgtable_32.h  | 1 -
 arch/sparc/include/asm/pgtable_64.h  | 3 ---
 arch/um/include/asm/pgtable-2level.h | 1 -
 arch/um/include/asm/pgtable-3level.h | 1 -
 arch/x86/include/asm/pgtable_types.h | 2 --
 arch/xtensa/include/asm/pgtable.h| 1 -
 include/linux/pgtable.h  | 9 +
 26 files changed, 9 insertions(+), 43 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 8d856c62e22a..1a2fb0dc905b 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -46,7 +46,6 @@ struct vm_area_struct;
 #define PTRS_PER_PMD   (1UL << (PAGE_SHIFT-3))
 #define PTRS_PER_PGD   (1UL << (PAGE_SHIFT-3))
 #define USER_PTRS_PER_PGD  (TASK_SIZE / PGDIR_SIZE)
-#define FIRST_USER_ADDRESS 0UL
 
 /* Number of pointers that fit on a page:  this will go away. */
 #define PTRS_PER_PAGE  (1UL << (PAGE_SHIFT-3))
diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 163641726a2b..a9fabfb70664 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -228,12 +228,6 @@
  */
 #defineUSER_PTRS_PER_PGD   (TASK_SIZE / PGDIR_SIZE)
 
-/*
- * No special requirements for lowest virtual address we permit any user space
- * mapping to be mapped at.
- */
-#define FIRST_USER_ADDRESS  0UL
-
 
 /
  * Bucket load of VM Helpers
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 47027796c2f9..f6ab8b64967e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -26,8 +26,6 @@
 
 #define vmemmap((struct page *)VMEMMAP_START - 
(memstart_addr >> PAGE_SHIFT))
 
-#define FIRST_USER_ADDRESS 0UL
-
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 0d60367b6bfa..151607ed5158 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -14,7 +14,6 @@
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
 #define USER_PTRS_PER_PGD  (PAGE_OFFSET/PGDIR_SIZE)
-#define FIRST_USER_ADDRESS 0UL
 
 /*
  * C-SKY is two-level paging structure:
diff --git a/arch/hexagon/include/asm/pgtable.h 
b/arch/hexagon/include/asm/pgtable.h
index dbb22b80b8c4..e4979508cddf 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -155,9 +155,6 @@ extern unsigned long _dflt_cache_att;
 
 extern pgd_t swapper_pg_dir[PTRS_PE

Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS

2021-04-14 Thread Anshuman Khandual
On 4/14/21 11:40 AM, Christophe Leroy wrote:
> 
> 
> Le 14/04/2021 à 07:59, Anshuman Khandual a écrit :
>>
>>
>> On 4/14/21 10:52 AM, Christophe Leroy wrote:
>>>
>>>
>>> Le 14/04/2021 à 04:54, Anshuman Khandual a écrit :
>>>> Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the
>>>> same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS
>>>> for those platforms which would override generic default FIRST_USER_ADDRESS
>>>> value 0UL. This makes it much cleaner with reduced code.
>>>>
>>>> Cc: linux-al...@vger.kernel.org
>>>> Cc: linux-snps-...@lists.infradead.org
>>>> Cc: linux-arm-ker...@lists.infradead.org
>>>> Cc: linux-c...@vger.kernel.org
>>>> Cc: linux-hexa...@vger.kernel.org
>>>> Cc: linux-i...@vger.kernel.org
>>>> Cc: linux-m...@lists.linux-m68k.org
>>>> Cc: linux-m...@vger.kernel.org
>>>> Cc: openr...@lists.librecores.org
>>>> Cc: linux-par...@vger.kernel.org
>>>> Cc: linuxppc-...@lists.ozlabs.org
>>>> Cc: linux-ri...@lists.infradead.org
>>>> Cc: linux-s...@vger.kernel.org
>>>> Cc: linux...@vger.kernel.org
>>>> Cc: sparcli...@vger.kernel.org
>>>> Cc: linux...@lists.infradead.org
>>>> Cc: linux-xte...@linux-xtensa.org
>>>> Cc: x...@kernel.org
>>>> Cc: linux...@kvack.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Anshuman Khandual 
>>>> ---
>>>>    arch/alpha/include/asm/pgtable.h | 1 -
>>>>    arch/arc/include/asm/pgtable.h   | 6 --
>>>>    arch/arm/Kconfig | 1 +
>>>>    arch/arm64/include/asm/pgtable.h | 2 --
>>>>    arch/csky/include/asm/pgtable.h  | 1 -
>>>>    arch/hexagon/include/asm/pgtable.h   | 3 ---
>>>>    arch/ia64/include/asm/pgtable.h  | 1 -
>>>>    arch/m68k/include/asm/pgtable_mm.h   | 1 -
>>>>    arch/microblaze/include/asm/pgtable.h    | 2 --
>>>>    arch/mips/include/asm/pgtable-32.h   | 1 -
>>>>    arch/mips/include/asm/pgtable-64.h   | 1 -
>>>>    arch/nds32/Kconfig   | 1 +
>>>>    arch/nios2/include/asm/pgtable.h | 2 --
>>>>    arch/openrisc/include/asm/pgtable.h  | 1 -
>>>>    arch/parisc/include/asm/pgtable.h    | 2 --
>>>>    arch/powerpc/include/asm/book3s/pgtable.h    | 1 -
>>>>    arch/powerpc/include/asm/nohash/32/pgtable.h | 1 -
>>>>    arch/powerpc/include/asm/nohash/64/pgtable.h | 2 --
>>>>    arch/riscv/include/asm/pgtable.h | 2 --
>>>>    arch/s390/include/asm/pgtable.h  | 2 --
>>>>    arch/sh/include/asm/pgtable.h    | 2 --
>>>>    arch/sparc/include/asm/pgtable_32.h  | 1 -
>>>>    arch/sparc/include/asm/pgtable_64.h  | 3 ---
>>>>    arch/um/include/asm/pgtable-2level.h | 1 -
>>>>    arch/um/include/asm/pgtable-3level.h | 1 -
>>>>    arch/x86/include/asm/pgtable_types.h | 2 --
>>>>    arch/xtensa/include/asm/pgtable.h    | 1 -
>>>>    include/linux/mm.h   | 4 
>>>>    mm/Kconfig   | 4 
>>>>    29 files changed, 10 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 8ba434287387..47098ccd715e 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness;
>>>>      void init_mm_internals(void);
>>>>    +#ifndef ARCH_HAS_FIRST_USER_ADDRESS
>>>
>>> I guess you didn't test it . :)
>>
>> In fact I did :) Though just booted it on arm64 and cross compiled on
>> multiple others platforms.

I guess for all platforms, ARCH_HAS_FIRST_USER_ADDRESS would have just
evaluated to be false hence falling back on the generic definition. So
this never complained during build any where or during boot on arm64.

>>
>>>
>>> should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS
>>
>> Right, meant that instead.
>>
>>>
>>>> +#define FIRST_USER_ADDRESS    0UL
>>>> +#endif
>>>
>>> But why do we need a config option at all for that ?
>>>
>&

Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS

2021-04-14 Thread Anshuman Khandual



On 4/14/21 10:52 AM, Christophe Leroy wrote:
> 
> 
> Le 14/04/2021 à 04:54, Anshuman Khandual a écrit :
>> Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the
>> same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS
>> for those platforms which would override generic default FIRST_USER_ADDRESS
>> value 0UL. This makes it much cleaner with reduced code.
>>
>> Cc: linux-al...@vger.kernel.org
>> Cc: linux-snps-...@lists.infradead.org
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-c...@vger.kernel.org
>> Cc: linux-hexa...@vger.kernel.org
>> Cc: linux-i...@vger.kernel.org
>> Cc: linux-m...@lists.linux-m68k.org
>> Cc: linux-m...@vger.kernel.org
>> Cc: openr...@lists.librecores.org
>> Cc: linux-par...@vger.kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux-ri...@lists.infradead.org
>> Cc: linux-s...@vger.kernel.org
>> Cc: linux...@vger.kernel.org
>> Cc: sparcli...@vger.kernel.org
>> Cc: linux...@lists.infradead.org
>> Cc: linux-xte...@linux-xtensa.org
>> Cc: x...@kernel.org
>> Cc: linux...@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual 
>> ---
>>   arch/alpha/include/asm/pgtable.h | 1 -
>>   arch/arc/include/asm/pgtable.h   | 6 --
>>   arch/arm/Kconfig | 1 +
>>   arch/arm64/include/asm/pgtable.h | 2 --
>>   arch/csky/include/asm/pgtable.h  | 1 -
>>   arch/hexagon/include/asm/pgtable.h   | 3 ---
>>   arch/ia64/include/asm/pgtable.h  | 1 -
>>   arch/m68k/include/asm/pgtable_mm.h   | 1 -
>>   arch/microblaze/include/asm/pgtable.h    | 2 --
>>   arch/mips/include/asm/pgtable-32.h   | 1 -
>>   arch/mips/include/asm/pgtable-64.h   | 1 -
>>   arch/nds32/Kconfig   | 1 +
>>   arch/nios2/include/asm/pgtable.h | 2 --
>>   arch/openrisc/include/asm/pgtable.h  | 1 -
>>   arch/parisc/include/asm/pgtable.h    | 2 --
>>   arch/powerpc/include/asm/book3s/pgtable.h    | 1 -
>>   arch/powerpc/include/asm/nohash/32/pgtable.h | 1 -
>>   arch/powerpc/include/asm/nohash/64/pgtable.h | 2 --
>>   arch/riscv/include/asm/pgtable.h | 2 --
>>   arch/s390/include/asm/pgtable.h  | 2 --
>>   arch/sh/include/asm/pgtable.h    | 2 --
>>   arch/sparc/include/asm/pgtable_32.h  | 1 -
>>   arch/sparc/include/asm/pgtable_64.h  | 3 ---
>>   arch/um/include/asm/pgtable-2level.h | 1 -
>>   arch/um/include/asm/pgtable-3level.h | 1 -
>>   arch/x86/include/asm/pgtable_types.h | 2 --
>>   arch/xtensa/include/asm/pgtable.h    | 1 -
>>   include/linux/mm.h   | 4 
>>   mm/Kconfig   | 4 
>>   29 files changed, 10 insertions(+), 43 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 8ba434287387..47098ccd715e 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness;
>>     void init_mm_internals(void);
>>   +#ifndef ARCH_HAS_FIRST_USER_ADDRESS
> 
> I guess you didn't test it . :)

In fact I did :) Though just booted it on arm64 and cross compiled on
multiple others platforms.

> 
> should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS

Right, meant that instead.

> 
>> +#define FIRST_USER_ADDRESS    0UL
>> +#endif
> 
> But why do we need a config option at all for that ?
> 
> Why not just:
> 
> #ifndef FIRST_USER_ADDRESS
> #define FIRST_USER_ADDRESS    0UL
> #endif

This sounds simpler. But just wondering, would not there be any possibility
of build problems due to compilation sequence between arch and generic code ?


[PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS

2021-04-13 Thread Anshuman Khandual
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the
same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS
for those platforms which would override generic default FIRST_USER_ADDRESS
value 0UL. This makes it much cleaner with reduced code.

Cc: linux-al...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-hexa...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: openr...@lists.librecores.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/alpha/include/asm/pgtable.h | 1 -
 arch/arc/include/asm/pgtable.h   | 6 --
 arch/arm/Kconfig | 1 +
 arch/arm64/include/asm/pgtable.h | 2 --
 arch/csky/include/asm/pgtable.h  | 1 -
 arch/hexagon/include/asm/pgtable.h   | 3 ---
 arch/ia64/include/asm/pgtable.h  | 1 -
 arch/m68k/include/asm/pgtable_mm.h   | 1 -
 arch/microblaze/include/asm/pgtable.h| 2 --
 arch/mips/include/asm/pgtable-32.h   | 1 -
 arch/mips/include/asm/pgtable-64.h   | 1 -
 arch/nds32/Kconfig   | 1 +
 arch/nios2/include/asm/pgtable.h | 2 --
 arch/openrisc/include/asm/pgtable.h  | 1 -
 arch/parisc/include/asm/pgtable.h| 2 --
 arch/powerpc/include/asm/book3s/pgtable.h| 1 -
 arch/powerpc/include/asm/nohash/32/pgtable.h | 1 -
 arch/powerpc/include/asm/nohash/64/pgtable.h | 2 --
 arch/riscv/include/asm/pgtable.h | 2 --
 arch/s390/include/asm/pgtable.h  | 2 --
 arch/sh/include/asm/pgtable.h| 2 --
 arch/sparc/include/asm/pgtable_32.h  | 1 -
 arch/sparc/include/asm/pgtable_64.h  | 3 ---
 arch/um/include/asm/pgtable-2level.h | 1 -
 arch/um/include/asm/pgtable-3level.h | 1 -
 arch/x86/include/asm/pgtable_types.h | 2 --
 arch/xtensa/include/asm/pgtable.h| 1 -
 include/linux/mm.h   | 4 
 mm/Kconfig   | 4 
 29 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 8d856c62e22a..1a2fb0dc905b 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -46,7 +46,6 @@ struct vm_area_struct;
 #define PTRS_PER_PMD   (1UL << (PAGE_SHIFT-3))
 #define PTRS_PER_PGD   (1UL << (PAGE_SHIFT-3))
 #define USER_PTRS_PER_PGD  (TASK_SIZE / PGDIR_SIZE)
-#define FIRST_USER_ADDRESS 0UL
 
 /* Number of pointers that fit on a page:  this will go away. */
 #define PTRS_PER_PAGE  (1UL << (PAGE_SHIFT-3))
diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 163641726a2b..a9fabfb70664 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -228,12 +228,6 @@
  */
 #defineUSER_PTRS_PER_PGD   (TASK_SIZE / PGDIR_SIZE)
 
-/*
- * No special requirements for lowest virtual address we permit any user space
- * mapping to be mapped at.
- */
-#define FIRST_USER_ADDRESS  0UL
-
 
 /
  * Bucket load of VM Helpers
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5da96f5df48f..ad086e6d7155 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
select ARCH_HAS_ELF_RANDOMIZE
+   select ARCH_HAS_FIRST_USER_ADDRESS
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 47027796c2f9..f6ab8b64967e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -26,8 +26,6 @@
 
 #define vmemmap((struct page *)VMEMMAP_START - 
(memstart_addr >> PAGE_SHIFT))
 
-#define FIRST_USER_ADDRESS 0UL
-
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 0d60367b6bfa..151607ed5158 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -14,7 +14,6 @@
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
 #define USER_PTRS_PER_PGD  (PAGE_OFFSET/PGDIR_SIZE)
-#define FIRST_USER_ADDRESS 0UL
 
 /*
  * C-SKY is two-level paging structure:
diff --git a/arch/hexagon/include/asm/pgtable.h 
b/arch/hexagon/inc

Re: [PATCH V2 4/6] mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION

2021-04-12 Thread Anshuman Khandual
On 4/12/21 5:29 PM, Oscar Salvador wrote:
> On Thu, Apr 01, 2021 at 12:14:06PM +0530, Anshuman Khandual wrote:
>> ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
>> platforms that subscribe them. Drop these reduntant definitions and instead
>> just select them appropriately.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Michael Ellerman 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: "H. Peter Anvin" 
>> Cc: Andrew Morton 
>> Cc: x...@kernel.org
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux...@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Acked-by: Catalin Marinas  (arm64)
>> Signed-off-by: Anshuman Khandual 
> 
> Hi Anshuman, 
> 
> X86 needs fixing, see below:
> 
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 503d8b2e8676..10702ef1eb57 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -60,8 +60,10 @@ config X86
>>  select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
>>  select ARCH_32BIT_OFF_T if X86_32
>>  select ARCH_CLOCKSOURCE_INIT
>> +select ARCH_ENABLE_HUGEPAGE_MIGRATION if x86_64 && HUGETLB_PAGE && 
>> MIGRATION
>>  select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
>>  select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
>> +select ARCH_ENABLE_THP_MIGRATION if x86_64 && TRANSPARENT_HUGEPAGE
> 
> you need s/x86_64/X86_64/, otherwise we are left with no migration :-)

Ahh, right. I guess this could not have got detected during a build test.
As the series is in mmotm tree, wondering if Andrew could help fix these
typos in this patch.


Re: [PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-12 Thread Anshuman Khandual
+ linuxppc-...@lists.ozlabs.org
+ linux-i...@vger.kernel.org

On 4/12/21 9:18 AM, Anshuman Khandual wrote:
> pageblock_order must always be less than MAX_ORDER, otherwise it might lead
> to an warning during boot. A similar problem got fixed on arm64 platform
> with the commit 79cc2ed5a716 ("arm64/mm: Drop THP conditionality from
> FORCE_MAX_ZONEORDER"). Assert the above condition before HUGETLB_PAGE_ORDER
> gets assigned as pageblock_order. This will help detect the problem earlier
> on platforms where HUGETLB_PAGE_SIZE_VARIABLE is enabled.
> 
> Cc: David Hildenbrand 
> Cc: Andrew Morton 
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
> 
> - Changed WARN_ON() to BUILD_BUG_ON() per David
> 
> Changes in V1:
> 
> https://patchwork.kernel.org/project/linux-mm/patch/1617947717-2424-1-git-send-email-anshuman.khand...@arm.com/
> 
>  mm/page_alloc.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cfc72873961d..19283bff4bec 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6875,10 +6875,17 @@ void __init set_pageblock_order(void)
>   if (pageblock_order)
>   return;
>  
> - if (HPAGE_SHIFT > PAGE_SHIFT)
> + if (HPAGE_SHIFT > PAGE_SHIFT) {
> + /*
> +  * pageblock_order must always be less than
> +  * MAX_ORDER. So does HUGETLB_PAGE_ORDER if
> +  * that is being assigned here.
> +  */
> + BUILD_BUG_ON(HUGETLB_PAGE_ORDER >= MAX_ORDER);

Unfortunately the build test fails on both the platforms (powerpc and ia64)
which subscribe HUGETLB_PAGE_SIZE_VARIABLE and where this check would make
sense. I some how overlooked the cross compile build failure that actually
detected this problem.

But wondering why this assert is not holding true ? and how these platforms
do not see the warning during boot (or do they ?) at mm/vmscan.c:1092 like
arm64 did.

static int __fragmentation_index(unsigned int order, struct contig_page_info 
*info)
{
unsigned long requested = 1UL << order;

if (WARN_ON_ONCE(order >= MAX_ORDER))
return 0;


Can pageblock_order really exceed MAX_ORDER - 1 ?

>   order = HUGETLB_PAGE_ORDER;
> - else
> + } else {
>   order = MAX_ORDER - 1;
> + }
>  
>   /*
>* Assume the largest contiguous order of interest is a huge page.
> 


[PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-11 Thread Anshuman Khandual
pageblock_order must always be less than MAX_ORDER, otherwise it might lead
to an warning during boot. A similar problem got fixed on arm64 platform
with the commit 79cc2ed5a716 ("arm64/mm: Drop THP conditionality from
FORCE_MAX_ZONEORDER"). Assert the above condition before HUGETLB_PAGE_ORDER
gets assigned as pageblock_order. This will help detect the problem earlier
on platforms where HUGETLB_PAGE_SIZE_VARIABLE is enabled.

Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
Changes in V2:

- Changed WARN_ON() to BUILD_BUG_ON() per David

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1617947717-2424-1-git-send-email-anshuman.khand...@arm.com/

 mm/page_alloc.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfc72873961d..19283bff4bec 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6875,10 +6875,17 @@ void __init set_pageblock_order(void)
if (pageblock_order)
return;
 
-   if (HPAGE_SHIFT > PAGE_SHIFT)
+   if (HPAGE_SHIFT > PAGE_SHIFT) {
+   /*
+* pageblock_order must always be less than
+* MAX_ORDER. So does HUGETLB_PAGE_ORDER if
+* that is being assigned here.
+*/
+   BUILD_BUG_ON(HUGETLB_PAGE_ORDER >= MAX_ORDER);
order = HUGETLB_PAGE_ORDER;
-   else
+   } else {
order = MAX_ORDER - 1;
+   }
 
/*
 * Assume the largest contiguous order of interest is a huge page.
-- 
2.20.1



Re: [PATCH -next] coresight: trbe: Fix return value check in arm_trbe_register_coresight_cpu()

2021-04-09 Thread Anshuman Khandual



On 4/9/21 6:52 AM, Wei Yongjun wrote:
> In case of error, the function devm_kasprintf() returns NULL
> pointer not ERR_PTR(). The IS_ERR() test in the return value
> check should be replaced with NULL test.

Right.

> 
> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")

Again, dont think this warrants "Fixes:" tag as there were no
functional problem to be fixed here.

> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c 
> b/drivers/hwtracing/coresight/coresight-trbe.c
> index 5ce239875c98..176868496879 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -871,7 +871,7 @@ static void arm_trbe_register_coresight_cpu(struct 
> trbe_drvdata *drvdata, int cp
>  
>   dev = >drvdata->pdev->dev;
>   desc.name = devm_kasprintf(dev, GFP_KERNEL, "trbe%d", cpu);
> - if (IS_ERR(desc.name))
> + if (!desc.name)
>   goto cpu_clear;
>  
>   desc.type = CORESIGHT_DEV_TYPE_SINK;
>

LGTM.


Re: [PATCH -next] coresight: core: Make symbol 'csdev_sink' static

2021-04-09 Thread Anshuman Khandual



On 4/9/21 7:02 AM, Wei Yongjun wrote:
> The sparse tool complains as follows:
> 
> drivers/hwtracing/coresight/coresight-core.c:26:1: warning:
>  symbol '__pcpu_scope_csdev_sink' was not declared. Should it be static?
> 
> This symbol is not used outside of coresight-core.c, so this
> commit marks it static.

commit ? It is not on the tree yet. s/commit/change instead.

> 
> Fixes: 2cd87a7b293d ("coresight: core: Add support for dedicated percpu 
> sinks")

There is no functional problem that this patch is proposing to fix
and hence the "Fixes:" tag is not warranted. Suzuki/Matihieu ?

> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/hwtracing/coresight/coresight-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
> b/drivers/hwtracing/coresight/coresight-core.c
> index 3e779e1619ed..6c68d34d956e 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -23,7 +23,7 @@
>  #include "coresight-priv.h"
>  
>  static DEFINE_MUTEX(coresight_mutex);
> -DEFINE_PER_CPU(struct coresight_device *, csdev_sink);
> +static DEFINE_PER_CPU(struct coresight_device *, csdev_sink);
>  
>  /**
>   * struct coresight_node - elements of a path, from source to sink
> 

Otherwise LGTM. As csdev_sink is not being used outside coresight-core.c
file after the introduction of coresight_[set|get]_percpu_sink() helpers.


Re: [PATCH] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-09 Thread Anshuman Khandual



On 4/9/21 1:54 PM, David Hildenbrand wrote:
> On 09.04.21 07:55, Anshuman Khandual wrote:
>> pageblock_order must always be less than MAX_ORDER, otherwise it might lead
>> to an warning during boot. A similar problem got fixed on arm64 platform
>> with the commit 79cc2ed5a716 ("arm64/mm: Drop THP conditionality from
>> FORCE_MAX_ZONEORDER"). Assert the above condition before HUGETLB_PAGE_ORDER
>> gets assigned as pageblock_order. This will help detect the problem earlier
>> on platforms where HUGETLB_PAGE_SIZE_VARIABLE is enabled.
>>
>> Cc: Andrew Morton 
>> Cc: linux...@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual 
>> ---
>>   mm/page_alloc.c | 11 +--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 604dcd69397b..81b7460e1228 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7068,10 +7068,17 @@ void __init set_pageblock_order(void)
>>   if (pageblock_order)
>>   return;
>>   -    if (HPAGE_SHIFT > PAGE_SHIFT)
>> +    if (HPAGE_SHIFT > PAGE_SHIFT) {
>> +    /*
>> + * pageblock_order must always be less than
>> + * MAX_ORDER. So does HUGETLB_PAGE_ORDER if
>> + * that is being assigned here.
>> + */
>> +    WARN_ON(HUGETLB_PAGE_ORDER >= MAX_ORDER);
> 
> Can't that be a BUILD_BUG_ON() ?

Yes, it can be. Probably might be appropriate as well, given that both
the arguments here are compile time constants. Okay, will change.

> 
>>   order = HUGETLB_PAGE_ORDER;
>> -    else
>> +    } else {
>>   order = MAX_ORDER - 1;
>> +    }
>>     /*
>>    * Assume the largest contiguous order of interest is a huge page.
>>
> 
> 


[PATCH] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-08 Thread Anshuman Khandual
pageblock_order must always be less than MAX_ORDER, otherwise it might lead
to an warning during boot. A similar problem got fixed on arm64 platform
with the commit 79cc2ed5a716 ("arm64/mm: Drop THP conditionality from
FORCE_MAX_ZONEORDER"). Assert the above condition before HUGETLB_PAGE_ORDER
gets assigned as pageblock_order. This will help detect the problem earlier
on platforms where HUGETLB_PAGE_SIZE_VARIABLE is enabled.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 mm/page_alloc.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 604dcd69397b..81b7460e1228 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7068,10 +7068,17 @@ void __init set_pageblock_order(void)
if (pageblock_order)
return;
 
-   if (HPAGE_SHIFT > PAGE_SHIFT)
+   if (HPAGE_SHIFT > PAGE_SHIFT) {
+   /*
+* pageblock_order must always be less than
+* MAX_ORDER. So does HUGETLB_PAGE_ORDER if
+* that is being assigned here.
+*/
+   WARN_ON(HUGETLB_PAGE_ORDER >= MAX_ORDER);
order = HUGETLB_PAGE_ORDER;
-   else
+   } else {
order = MAX_ORDER - 1;
+   }
 
/*
 * Assume the largest contiguous order of interest is a huge page.
-- 
2.20.1



Re: [PATCH -next v2 1/2] mm/debug_vm_pgtable: Move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE

2021-04-08 Thread Anshuman Khandual
* PMD is not a populated non-leaf entry.
> +  */
> + WRITE_ONCE(*pmdp, __pmd(0));
> + WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
> + WARN_ON(!pmd_clear_huge(pmdp));
> + pmd = READ_ONCE(*pmdp);
> + WARN_ON(!pmd_none(pmd));
>  }
> +
>  static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot)
>  {
> + pud_t pud;
> +
> + if (!arch_vmap_pud_supported(prot))
> + return;
> +
> + pr_debug("Validating PUD huge\n");
> + /*
> +  * X86 defined pud_set_huge() verifies that the given
> +  * PUD is not a populated non-leaf entry.
> +  */
> + WRITE_ONCE(*pudp, __pud(0));
> + WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
> + WARN_ON(!pud_clear_huge(pudp));
> + pud = READ_ONCE(*pudp);
> + WARN_ON(!pud_none(pud));
>  }
> -static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { }
> -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
> prot) { }
> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot) { }
> +#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>  
>  static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
>  {
> 

With changes to the commit message as suggested earlier.

Reviewed-by: Anshuman Khandual 


Re: [PATCH -next v2 2/2] mm/debug_vm_pgtable: Remove redundant pfn_{pmd/pte}() and fix one comment mistake

2021-04-08 Thread Anshuman Khandual


On 4/6/21 10:19 AM, Shixin Liu wrote:
> v1->v2:
> Remove redundant pfn_pte() and fold two patch to one.

Change log should always be after the '---' below the SOB statement for git
am to ignore them. Please avoid adding them in the commit messages.

> 
> Remove redundant pfn_{pmd/pte}() and fix one comment mistake.
> 
> Signed-off-by: Shixin Liu 
> ---
>  mm/debug_vm_pgtable.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index d3cf178621d9..e2f35db8ba69 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -91,7 +91,7 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
> unsigned long pfn, unsigned long vaddr,
> pgprot_t prot)
>  {
> - pte_t pte = pfn_pte(pfn, prot);
> + pte_t pte;
>  

Right.

>   /*
>* Architectures optimize set_pte_at by avoiding TLB flush.
> @@ -185,7 +185,7 @@ static void __init pmd_advanced_tests(struct mm_struct 
> *mm,
> unsigned long pfn, unsigned long vaddr,
> pgprot_t prot, pgtable_t pgtable)
>  {
> - pmd_t pmd = pfn_pmd(pfn, prot);
> + pmd_t pmd;


Right.

>  
>   if (!has_transparent_hugepage())
>   return;
> @@ -300,7 +300,7 @@ static void __init pud_advanced_tests(struct mm_struct 
> *mm,
> unsigned long pfn, unsigned long vaddr,
> pgprot_t prot)
>  {
> - pud_t pud = pfn_pud(pfn, prot);
> + pud_t pud;
>  
>   if (!has_transparent_hugepage())
>   return;
> @@ -309,6 +309,7 @@ static void __init pud_advanced_tests(struct mm_struct 
> *mm,
>   /* Align the address wrt HPAGE_PUD_SIZE */
>   vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
>  
> + pud = pfn_pud(pfn, prot);

Is this change intended to make pud_advanced_tests() similar other
advanced tests ? Please update the commit message as well.

>   set_pud_at(mm, vaddr, pudp, pud);
>   pudp_set_wrprotect(mm, vaddr, pudp);
>   pud = READ_ONCE(*pudp);
> @@ -742,12 +743,12 @@ static void __init pmd_swap_soft_dirty_tests(unsigned 
> long pfn, pgprot_t prot)
>   WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
>   WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
>  }
> -#else  /* !CONFIG_ARCH_HAS_PTE_DEVMAP */
> +#else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>  static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
>  static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t 
> prot)
>  {
>  }
> -#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>  {
> 
With changes to the commit message as suggested earlier.

Reviewed-by: Anshuman Khandual 


Re: [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-07 Thread Anshuman Khandual
Adding James here.

+ James Morse 

On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> Hi,
> 
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1. 

That would be really great for arm64 platform as it will save CPU cycles on
many generic MM paths, given that our pfn_valid() has been expensive.

> 
> The idea is to mark NOMAP pages as reserved in the memory map and restore

Though I am not really sure, would that possibly be problematic for UEFI/EFI
use cases as it might have just treated them as normal struct pages till now.

> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.

Right, that would be better as the current semantics is not ideal.

> 
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
> 
> The patches are only boot tested on qemu-system-aarch64 so I'd really
> appreciate memory stress tests on real hardware.

Did some preliminary memory stress tests on a guest with portions of memory
marked as MEMBLOCK_NOMAP and did not find any obvious problem. But this might
require some testing on real UEFI environment with firmware using MEMBLOCK_NOMAP
memory to make sure that changing these struct pages to PageReserved() is safe.


> 
> If this actually works we'll be one step closer to drop custom pfn_valid()
> on arm64 altogether.

Right, planning to rework and respin the RFC originally sent last month.

https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khand...@arm.com/


Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages

2021-04-07 Thread Anshuman Khandual



On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.

This would definitely need updating the comment for MEMBLOCK_NOMAP definition
in include/linux/memblock.h just to make the semantics is clear, though arm64
is currently the only user for MEMBLOCK_NOMAP.

> 
> Signed-off-by: Mike Rapoport 
> ---
>  mm/memblock.c | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..6b7ea9d86310 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2002,6 +2002,26 @@ static unsigned long __init 
> __free_memory_core(phys_addr_t start,
>   return end_pfn - start_pfn;
>  }
>  
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, , )
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }
> +}
> +
>  static unsigned long __init free_low_memory_core_early(void)
>  {
>   unsigned long count = 0;
> @@ -2010,8 +2030,7 @@ static unsigned long __init 
> free_low_memory_core_early(void)
>  
>   memblock_clear_hotplug(0, -1);
>  
> - for_each_reserved_mem_range(i, , )
> - reserve_bootmem_region(start, end);
> + memmap_init_reserved_pages();
>  
>   /*
>* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 


Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()

2021-04-07 Thread Anshuman Khandual


On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.

Should there be a comment affirming this semantics interpretation, above the
generic pfn_valid() in include/linux/mmzone.h ?

> 
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
> 
> Introduce a dedicated pfn_is_memory() to perform such check and use it
> where appropriate.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/arm64/include/asm/memory.h | 2 +-
>  arch/arm64/include/asm/page.h   | 1 +
>  arch/arm64/kvm/mmu.c| 2 +-
>  arch/arm64/mm/init.c| 6 ++
>  arch/arm64/mm/ioremap.c | 4 ++--
>  arch/arm64/mm/mmu.c | 2 +-
>  6 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0aabc3be9a75..7e77fdf71b9d 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>  
>  #define virt_addr_valid(addr)({  
> \
>   __typeof__(addr) __addr = __tag_reset(addr);\
> - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
> + __is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));  \
>  })
>  
>  void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..32b485bcc6ff 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>  typedef struct page *pgtable_t;
>  
>  extern int pfn_valid(unsigned long);
> +extern int pfn_is_memory(unsigned long);
>  
>  #include 
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8711894db8c2..ad2ea65a3937 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
>  {
> - return !pfn_valid(pfn);
> + return !pfn_is_memory(pfn);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 3685e12aba9b..258b1905ed4a 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> +int pfn_is_memory(unsigned long pfn)
> +{
> + return memblock_is_map_memory(PFN_PHYS(pfn));
> +}
> +EXPORT_SYMBOL(pfn_is_memory);> +

Should not this be generic though ? There is nothing platform or arm64
specific in here. Wondering as pfn_is_memory() just indicates that the
pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
instead ? Regardless, it's fine either way.

>  static phys_addr_t memory_limit = PHYS_ADDR_MAX;
>  
>  /*
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index b5e83c46b23e..82a369b22ef5 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t 
> phys_addr, size_t size,
>   /*
>* Don't allow RAM to be mapped.
>*/
> - if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
> + if (WARN_ON(pfn_is_memory(__phys_to_pfn(phys_addr
>   return NULL;
>  
>   area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
>  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>  {
>   /* For normal memory we already have a cacheable mapping. */
> - if (pfn_valid(__phys_to_pfn(phys_addr)))
> + if (pfn_is_memory(__phys_to_pfn(phys_addr)))
>   return (void __iomem *)__phys_to_virt(phys_addr);
>  
>   return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 5d9550fdb9cf..038d20fe163f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> unsigned long size, pgprot_t vma_prot)
>  {
> - if (!pfn_valid(pfn))
> + if (!pfn_is_memory(pfn))
>   return pgprot_noncached(vma_prot);
>   else if (file->f_flags & O_SYNC)
>   return pgprot_writecombine(vma_prot);
> 


Re: [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-07 Thread Anshuman Khandual


On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
> 
> * Parts of the memory map are freed during boot. This makes it necessary to
>   verify that there is actual physical memory that corresponds to a pfn
>   which is done by querying memblock.
> 
> * There are NOMAP memory regions. These regions are not mapped in the
>   linear map and until the previous commit the struct pages representing
>   these areas had default values.
> 
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
> 
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.

But what about the memory map that are freed during boot (mentioned above).
Would not they still cause CONFIG_HOLES_IN_ZONE to be applicable and hence
pfn_valid_within() ?

> 
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().

Just to understand this better, pfn_valid() will now return true for all
MEMBLOCK_NOMAP based memory but that is okay as core MM would still ignore
them as unusable memory for being PageReserved().

> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/arm64/Kconfig   | 3 ---
>  arch/arm64/mm/init.c | 4 ++--
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>   def_bool y
>   depends on NUMA
>  
> -config HOLES_IN_ZONE
> - def_bool y
> -
>  source "kernel/Kconfig.hz"
>  
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 258b1905ed4a..bb6dd406b1f0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>  
>   /*
>* ZONE_DEVICE memory does not have the memblock entries.
> -  * memblock_is_map_memory() check for ZONE_DEVICE based
> +  * memblock_is_memory() check for ZONE_DEVICE based
>* addresses will always fail. Even the normal hotplugged
>* memory will never have MEMBLOCK_NOMAP flag set in their
>* memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>   return pfn_section_valid(ms, pfn);
>  }
>  #endif
> - return memblock_is_map_memory(addr);
> + return memblock_is_memory(addr);
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> 


Re: [PATCH -next 3/3] mm/debug_vm_pgtable: Remove useless pfn_pmd()

2021-04-01 Thread Anshuman Khandual



On 4/1/21 7:53 AM, Shixin Liu wrote:
> The call to pfn_pmd() here is redundant.
> 
> Signed-off-by: Shixin Liu 
> ---
>  mm/debug_vm_pgtable.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index c379bbe42c2a..9f4c4a114229 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -196,7 +196,6 @@ static void __init pmd_advanced_tests(struct mm_struct 
> *mm,
>  
>   pgtable_trans_huge_deposit(mm, pmdp, pgtable);
>  
> - pmd = pfn_pmd(pfn, prot);
>   set_pmd_at(mm, vaddr, pmdp, pmd);
>   pmdp_set_wrprotect(mm, vaddr, pmdp);
>   pmd = READ_ONCE(*pmdp);
> 

Instead drop the first pfn_pmd(), as that pmd would not be required
when THP is not enabled. Also, please fold this with the first patch.


Re: [PATCH -next 1/3] mm/debug_vm_pgtable: Fix one comment mistake

2021-04-01 Thread Anshuman Khandual


On 4/1/21 7:53 AM, Shixin Liu wrote:
> The branch condition should be CONFIG_TRANSPARENT_HUGEPAGE instead of
> CONFIG_ARCH_HAS_PTE_DEVMAP.
> 
> Signed-off-by: Shixin Liu 
> ---
>  mm/debug_vm_pgtable.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 05efe98a9ac2..a5c71a94e804 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -755,12 +755,12 @@ static void __init pmd_swap_soft_dirty_tests(unsigned 
> long pfn, pgprot_t prot)
>   WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
>   WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
>  }
> -#else  /* !CONFIG_ARCH_HAS_PTE_DEVMAP */
> +#else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>  static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
>  static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t 
> prot)
>  {
>  }
> -#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>  {
> 

LGTM, thanks for catching. But does not need a patch of it's
own, instead should be folded with other potential clean ups.


Re: [PATCH -next 2/3] mm/debug_vm_pgtable: Move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE

2021-04-01 Thread Anshuman Khandual


On 4/1/21 7:53 AM, Shixin Liu wrote:
> The functions {pmd/pud}_set_huge and {pmd/pud}_clear_huge is not depend on 
> THP.

s/is not depend/are not dependent/

> But now if we want to test these functions, we have to enable THP. So move
> {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE.

Please drop the second sentence here.

> 
> Signed-off-by: Shixin Liu 
> ---
>  mm/debug_vm_pgtable.c | 91 +++
>  1 file changed, 39 insertions(+), 52 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index a5c71a94e804..c379bbe42c2a 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -242,29 +242,6 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
> pgprot_t prot)
>   WARN_ON(!pmd_leaf(pmd));
>  }
>  
> -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> -static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
> prot)
> -{
> - pmd_t pmd;
> -
> - if (!arch_vmap_pmd_supported(prot))
> - return;
> -
> - pr_debug("Validating PMD huge\n");
> - /*
> -  * X86 defined pmd_set_huge() verifies that the given
> -  * PMD is not a populated non-leaf entry.
> -  */
> - WRITE_ONCE(*pmdp, __pmd(0));
> - WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
> - WARN_ON(!pmd_clear_huge(pmdp));
> - pmd = READ_ONCE(*pmdp);
> - WARN_ON(!pmd_none(pmd));
> -}
> -#else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
> -static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
> prot) { }
> -#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
> -
>  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>  {
>   pmd_t pmd = pfn_pmd(pfn, prot);
> @@ -379,30 +356,6 @@ static void __init pud_leaf_tests(unsigned long pfn, 
> pgprot_t prot)
>   pud = pud_mkhuge(pud);
>   WARN_ON(!pud_leaf(pud));
>  }
> -
> -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot)
> -{
> - pud_t pud;
> -
> - if (!arch_vmap_pud_supported(prot))
> - return;
> -
> - pr_debug("Validating PUD huge\n");
> - /*
> -  * X86 defined pud_set_huge() verifies that the given
> -  * PUD is not a populated non-leaf entry.
> -  */
> - WRITE_ONCE(*pudp, __pud(0));
> - WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
> - WARN_ON(!pud_clear_huge(pudp));
> - pud = READ_ONCE(*pudp);
> - WARN_ON(!pud_none(pud));
> -}
> -#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot) { }
> -#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> -
>  #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>  static void __init pud_basic_tests(struct mm_struct *mm, unsigned long pfn, 
> int idx) { }
>  static void __init pud_advanced_tests(struct mm_struct *mm,
> @@ -412,9 +365,6 @@ static void __init pud_advanced_tests(struct mm_struct 
> *mm,
>  {
>  }
>  static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
> -static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot)
> -{
> -}
>  #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>  #else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>  static void __init pmd_basic_tests(unsigned long pfn, int idx) { }
> @@ -433,14 +383,51 @@ static void __init pud_advanced_tests(struct mm_struct 
> *mm,
>  }
>  static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>  static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>  static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
> prot)
>  {
> + pmd_t pmd;
> +
> + if (!arch_vmap_pmd_supported(prot))
> + return;
> +
> + pr_debug("Validating PMD huge\n");
> + /*
> +  * X86 defined pmd_set_huge() verifies that the given
> +  * PMD is not a populated non-leaf entry.
> +  */
> + WRITE_ONCE(*pmdp, __pmd(0));
> + WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
> + WARN_ON(!pmd_clear_huge(pmdp));
> + pmd = READ_ONCE(*pmdp);
> + WARN_ON(!pmd_none(pmd));
>  }
> +
>  static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
> prot)
>  {
> + pud_t pud;
> +
> + if (!arch_vmap_pud_supported(prot))
> + return;
> +
> + pr_debug("Validating PUD huge\n");
> + /*
> +  * X86 defined pud_set_huge() verifies that the given
> +  * PUD is not a populated non-leaf entry.
> +  */
> + WRITE_ONCE(*pudp, __pud(0));
> + WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
> + WARN_ON(!pud_clear_huge(pudp));
> + pud = READ_ONCE(*pudp);
> + WARN_ON(!pud_none(pud));
>  }
> -static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { }
> -#endif /* 

[PATCH V2 RESEND] mm/memtest: Add ARCH_USE_MEMTEST

2021-04-01 Thread Anshuman Khandual
early_memtest() does not get called from all architectures. Hence enabling
CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel command line
option might not trigger the memory pattern tests as would be expected in
normal circumstances. This situation is misleading.

The change here prevents the above mentioned problem after introducing a
new config option ARCH_USE_MEMTEST that should be subscribed on platforms
that call early_memtest(), in order to enable the config CONFIG_MEMTEST.
Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would
not be tested anyway.

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Thomas Bogendoerfer 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas  (arm64)
Reviewed-by: Max Filippov 
Signed-off-by: Anshuman Khandual 
---
This patch applies on v5.12-rc5 and has been tested on arm64 platform.
But it has been just build tested on all other platforms.

Changes in V2:

https://patchwork.kernel.org/project/linux-mm/patch/1614573126-7740-1-git-send-email-anshuman.khand...@arm.com/

- Added ARCH_USE_MEMTEST in the sorted alphabetical order on platforms

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1612498242-31579-1-git-send-email-anshuman.khand...@arm.com/

 arch/arm/Kconfig | 1 +
 arch/arm64/Kconfig   | 1 +
 arch/mips/Kconfig| 1 +
 arch/powerpc/Kconfig | 1 +
 arch/x86/Kconfig | 1 +
 arch/xtensa/Kconfig  | 1 +
 lib/Kconfig.debug| 9 -
 7 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5da96f5df48f..49878877df88 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -33,6 +33,7 @@ config ARM
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
+   select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..63c380587a77 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -67,6 +67,7 @@ config ARM64
select ARCH_KEEP_MEMBLOCK
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_GNU_PROPERTY
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d89efba3d8a4..93a4f502f962 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -14,6 +14,7 @@ config MIPS
select ARCH_SUPPORTS_UPROBES
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..3778ad17f56a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,6 +149,7 @@ config PPC
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC32 || PPC_BOOK3S_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
select ARCH_WANT_IPC_PARSE_VERSION
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..2cb76fd5258e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,6 +100,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG  if X86_64
select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
select ARCH_USE_BUILTIN_BSWAP
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 9ad6b7b82707..524413aabbc4 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -7,6 +7,7 @@ config XTENSA
select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU
select ARCH_HAS_DMA_SET_UNCACHED if MMU
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_FRAME_POINTERS
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2779c29d9981..a3fd69e6f6af 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2515,11 +2515,18 @@ config TEST_FPU
 
 endif # RUNTIME_TESTING_MENU
 
+config ARCH_USE_MEMTEST
+   bool
+   help
+ An architecture

[PATCH V2 4/6] mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION

2021-04-01 Thread Anshuman Khandual
ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
platforms that subscribe them. Drop these reduntant definitions and instead
just select them appropriately.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas  (arm64)
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig | 10 ++
 arch/powerpc/platforms/Kconfig.cputype |  5 +
 arch/x86/Kconfig   | 10 ++
 3 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 48355c5519c3..cd012af0a4b7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,8 +11,10 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
+   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -1905,14 +1907,6 @@ config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on TRANSPARENT_HUGEPAGE
-
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index cec1017813f8..4465b71b2bff 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -96,6 +96,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_NUMA_BALANCING
@@ -420,10 +421,6 @@ config PPC_PKEY
depends on PPC_BOOK3S_64
depends on PPC_MEM_KEYS || PPC_KUAP || PPC_KUEP
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
-
 
 config PPC_MMU_NOHASH
def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 503d8b2e8676..10702ef1eb57 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,8 +60,10 @@ config X86
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if x86_64 && HUGETLB_PAGE && 
MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
+   select ARCH_ENABLE_THP_MIGRATION if x86_64 && TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE  if ACPI
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
@@ -2433,14 +2435,6 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
depends on X86_64 || X86_PAE
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on X86_64 && HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on X86_64 && TRANSPARENT_HUGEPAGE
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
-- 
2.20.1



[PATCH V2 3/6] mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]

2021-04-01 Thread Anshuman Khandual
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions
on platforms that subscribe them. Instead, just make them generic options
which can be selected on applicable platforms.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas  (arm64)
Acked-by: Heiko Carstens  (s390)
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig   |  8 ++--
 arch/ia64/Kconfig|  8 ++--
 arch/powerpc/Kconfig |  8 ++--
 arch/s390/Kconfig|  8 ++--
 arch/sh/Kconfig  |  2 ++
 arch/sh/mm/Kconfig   |  8 
 arch/x86/Kconfig | 10 ++
 mm/Kconfig   |  6 ++
 8 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c86b28ef6ac0..48355c5519c3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,8 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -305,12 +307,6 @@ config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SMP
def_bool y
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..96ce53ad5c9d 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -13,6 +13,8 @@ config IA64
select ARCH_MIGHT_HAVE_PC_SERIO
select ACPI
select ACPI_NUMA if NUMA
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_SUPPORTS_ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
@@ -250,12 +252,6 @@ config HOTPLUG_CPU
  can be controlled through /sys/devices/system/cpu/cpu#.
  Say N if you want to disable CPU hotplug.
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SCHED_SMT
bool "SMT scheduler support"
depends on SMP
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a74c211e55b1..02a05a24659d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -118,6 +118,8 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_32BIT_OFF_T if PPC32
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
@@ -515,12 +517,6 @@ config ARCH_CPU_PROBE_RELEASE
def_bool y
depends on HOTPLUG_CPU
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config PPC64_SUPPORTS_MEMORY_FAILURE
bool "Add support for memory hwpoison"
depends on PPC_BOOK3S_64
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c1ff874e6c2e..f8b356550daa 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -60,6 +60,8 @@ config S390
imply IMA_SECURE_AND_OR_TRUSTED_BOOT
select ARCH_32BIT_USTAT_F_TINODE
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_DEVMEM_IS_ALLOWED
@@ -626,12 +628,6 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y if SPARSEMEM
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
 
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index a54b0c5de37b..68129537e350 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -2,6 +2,8 @@
 config SUPERH
def_bool y
select ARCH_32BIT_OFF_T
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM && MMU
+   select ARCH_ENABLE_MEMORY_HOTREMOVE if SPARSEMEM && MMU
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_BINFMT_FLAT if !MMU
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 77aa2f802d8d..d551a9cac41e 

[PATCH V2 0/6] mm: some config cleanups

2021-04-01 Thread Anshuman Khandual
This series contains config cleanup patches which reduces code duplication
across platforms and also improves maintainability. There is no functional
change intended with this series. This has been boot tested on arm64 but
only build tested on some other platforms.

This applies on 5.12-rc5

Changes in V2:

- Rebased on 5.12-rc5
- Added tags from previous version

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/1615278790-18053-1-git-send-email-anshuman.khand...@arm.com/

Anshuman Khandual (6):
  mm: Generalize ARCH_HAS_CACHE_LINE_SIZE
  mm: Generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
  mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
  mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
  mm: Drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
  mm: Drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

 arch/arc/Kconfig   |  9 ++--
 arch/arm/Kconfig   | 10 ++---
 arch/arm64/Kconfig | 30 ++
 arch/ia64/Kconfig  |  8 ++-
 arch/mips/Kconfig  |  6 +-
 arch/parisc/Kconfig|  5 +
 arch/powerpc/Kconfig   | 11 ++
 arch/powerpc/platforms/Kconfig.cputype | 16 +-
 arch/riscv/Kconfig |  5 +
 arch/s390/Kconfig  | 12 +++
 arch/sh/Kconfig|  7 +++---
 arch/sh/mm/Kconfig |  8 ---
 arch/x86/Kconfig   | 29 ++---
 fs/Kconfig |  5 -
 mm/Kconfig |  9 
 15 files changed, 48 insertions(+), 122 deletions(-)

-- 
2.20.1



Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes

2021-03-10 Thread Anshuman Khandual


On 3/8/21 2:25 PM, Mike Rapoport wrote:
> Hi Anshuman,
> 
> On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>> memory sections might have contained memmap holes caused by memblock areas
>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>> for struct page backing. This scenario could be captured with a new option
>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>> improved to accommodate such platforms. This reduces overall code footprint
>> and also improves maintainability.
> 
> I wonder whether arm64 would still need to free parts of its memmap after

free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
I am not sure whether there still might be some platforms or boards which would
benefit from this. Hence lets just keep this unchanged for now.

> the section size was reduced. Maybe the pain of arm64::pfn_valid() is not
> worth the memory savings anymore?

arm64 pfn_valid() special case was primarily because of MEMBLOCK_NOMAP tagged
memory areas, which are reserved by the firmware.

> 
>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
> 
> The whole point of 4f5b0c178996 was to let arc and m68k to free unused
> memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So
> whatever implementation there will be for arm/arm64, please keep arc and
> m68k functionally intact.

Okay. Will protect free_unused_memmap() on HAVE_EARLY_SECTION_MEMMAP_HOLES
config as well.

diff --git a/mm/memblock.c b/mm/memblock.c
index d9fa2e62ab7a..11b624e94127 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1927,8 +1927,11 @@ static void __init free_unused_memmap(void)
unsigned long start, end, prev_end = 0;
int i;
 
-   if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
-   IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   return;
+
+   if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) &&
+   !IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID))
return;


Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes

2021-03-10 Thread Anshuman Khandual



On 3/8/21 2:07 PM, David Hildenbrand wrote:
> On 08.03.21 04:27, Anshuman Khandual wrote:
>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>> memory sections might have contained memmap holes caused by memblock areas
>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>> for struct page backing. This scenario could be captured with a new option
>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>> improved to accommodate such platforms. This reduces overall code footprint
>> and also improves maintainability.
>>
>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
>>
>> While here, it exports the symbol memblock_is_map_memory() to build drivers
>> that depend on pfn_valid() but does not have the required visibility. After
>> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
>> arm and arm64 platforms.
>>
>> Cc: Russell King 
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Andrew Morton 
>> Cc: Mike Rapoport 
>> Cc: David Hildenbrand 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux...@kvack.org
>> Suggested-by: David Hildenbrand 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
>> has been lightly tested on the arm64 platform. The idea to represent this
>> unique situation on the arm and arm64 platforms with a config option was
>> proposed by David H during an earlier discussion [2]. This still does not
>> build on arm platform due to pfn_valid() resolution errors. Nonetheless
>> wanted to get some early feedback whether the overall approach here, is
>> acceptable or not.
> 
> It might make sense to keep the arm variant for now. The arm64 variant is 
> where the magic happens and where we missed updates when working on the 
> generic variant.

Sure, will drop the changes on arm.

> 
> The generic variant really only applies to 64bit targets where we have 
> SPARSEMEM. See x86 as an example.

Okay.

> 
> [...]
> 
>>   /*
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 47946cec7584..93532994113f 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct 
>> mem_section *ms, unsigned long pfn)
>>   }
>>   #endif
>>   +bool memblock_is_map_memory(phys_addr_t addr);
>> +
>>   #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>>   static inline int pfn_valid(unsigned long pfn)
>> +{
>> +    phys_addr_t addr = PFN_PHYS(pfn);
>> +
>> +    /*
>> + * Ensure the upper PAGE_SHIFT bits are clear in the
>> + * pfn. Else it might lead to false positives when
>> + * some of the upper bits are set, but the lower bits
>> + * match a valid pfn.
>> + */
>> +    if (PHYS_PFN(addr) != pfn)
>> +    return 0;
> 
> I think this should be fine for other archs as well.
> 
>> +
>> +#ifdef CONFIG_SPARSEMEM
> 
> Why do we need the ifdef now? If that's to cover the arm case, then please 
> consider the arm64 case only for now.

Yes, it is not needed.

> 
>>   {
>>   struct mem_section *ms;
>>   @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned long pfn)
>>    * Traditionally early sections always returned pfn_valid() for
>>    * the entire section-sized span.
>>    */
>> -    return early_section(ms) || pfn_section_valid(ms, pfn);
>> +    if (early_section(ms))
>> +    return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
>> +    memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> +
>> +    return pfn_section_valid(ms, pfn);
>> +}
>> +#endif
>> +    return 1;
>>   }
>>   #endif
>>   diff --git a/mm/Kconfig b/mm/Kconfig
>> index 24c045b24b95..0ec20f661b3f 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP
>>   config ARCH_KEEP_MEMBLOCK
>>   bool
>>   +config HAVE_EARLY_SECTION_MEMMAP_HOLES
>> +    depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
>> +    def_bool n
>>

Re: [PATCH V2] mm/memtest: Add ARCH_USE_MEMTEST

2021-03-09 Thread Anshuman Khandual



On 3/1/21 10:02 AM, Anshuman Khandual wrote:
> early_memtest() does not get called from all architectures. Hence enabling
> CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel command line
> option might not trigger the memory pattern tests as would be expected in
> normal circumstances. This situation is misleading.
> 
> The change here prevents the above mentioned problem after introducing a
> new config option ARCH_USE_MEMTEST that should be subscribed on platforms
> that call early_memtest(), in order to enable the config CONFIG_MEMTEST.
> Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would
> not be tested anyway.
> 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Thomas Bogendoerfer 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Chris Zankel 
> Cc: Max Filippov 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-m...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-xte...@linux-xtensa.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Max Filippov 
> Signed-off-by: Anshuman Khandual 
> ---
> This patch applies on v5.12-rc1 and has been tested on arm64 platform.
> But it has been just build tested on all other platforms.
> 
> Changes in V2:
> 
> - Added ARCH_USE_MEMTEST in the sorted alphabetical order on platforms

Gentle ping, any updates or objections ?


[PATCH V2] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-09 Thread Anshuman Khandual
From: James Morse 

As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Suzuki K Poulose 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvm...@lists.cs.columbia.edu
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marc Zyngier 
Signed-off-by: James Morse 
Signed-off-by: Anshuman Khandual 
---
Changes in V2:

- Changes back to switch construct in kvm_set_ipa_limit() per Marc

Changes in V1:

https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=442817

 arch/arm64/include/asm/sysreg.h   | 20 ++--
 arch/arm64/kernel/head.S  |  6 --
 arch/arm64/kvm/reset.c| 10 ++
 drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
 4 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index dfd4edb..d4a5fca9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -796,6 +796,11 @@
 #define ID_AA64MMFR0_PARANGE_480x5
 #define ID_AA64MMFR0_PARANGE_520x6
 
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7
+
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define ID_AA64MMFR0_PARANGE_MAX   ID_AA64MMFR0_PARANGE_52
 #else
@@ -961,14 +966,17 @@
 #define ID_PFR1_PROGMOD_SHIFT  0
 
 #if defined(CONFIG_ARM64_4K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #elif defined(CONFIG_ARM64_16K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0xF
 #elif defined(CONFIG_ARM64_64K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #endif
 
 #define MVFR2_FPMISC_SHIFT 4
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b..8b469f1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
 SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
-   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
-   b.ne__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
+   b.lt__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
+   b.gt__no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrpx2, idmap_pg_dir
phys_to_ttbr x1, x1
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f03..e81c7ec 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -311,16 +311,18 @@ int kvm_set_ipa_limit(void)
}
 
switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
-   default:
-   case 1:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
return -EINVAL;
-   case 0:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
break;
-   case 2:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... 
ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
break;
+   default:
+   kvm_err("Unsupported value for TGRAN_2, giving up\n");
+   return -EINVAL;
}
 
kvm_ipa_limit = id_aa64mm

Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-09 Thread Anshuman Khandual



On 3/9/21 7:35 PM, Will Deacon wrote:
> On Mon, Mar 08, 2021 at 02:42:00PM +, Marc Zyngier wrote:
>> On Fri, 05 Mar 2021 14:36:09 +,
>> Anshuman Khandual  wrote:
>>> -   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
>>> -   default:
>>> -   case 1:
>>> +   tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift);
>>> +   if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE) {
>>> kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
>>> return -EINVAL;
>>> -   case 0:
>>> +   } else if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT) {
>>> kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
>>> -   break;
>>> -   case 2:
>>> +   } else if (tgran_2 >= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN &&
>>> +  tgran_2 <= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX) {
>>> kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
>>> -   break;
>>> +   } else {
>>> +   kvm_err("Unsupported value, giving up\n");
>>> +   return -EINVAL;
>>
>> nit: this doesn't say *what* value is unsupported, and I really
>> preferred the switch-case version, such as this:
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 1f22b36a0eff..d267e4b1aec6 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -312,15 +312,18 @@ int kvm_set_ipa_limit(void)
>>  
>>  switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
>>  default:
>> -case 1:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
>>  kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
>>  return -EINVAL;
>> -case 0:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
>>  kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
>>  break;
>> -case 2:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... 
>> ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
>>  kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
>>  break;
>> +default:
>> +kvm_err("Unsupported value for TGRAN_2, giving up\n");
>> +return -EINVAL;
>>  }
>>  
>>  kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
>>
>>
>> Otherwise:
>>
>> Acked-by: Marc Zyngier 
> 
> Anshuman -- please can you spin a v2 with the switch syntax as suggested
> above by Marc?

Sure, will do.



Re: [PATCH 0/6] mm: some config cleanups

2021-03-09 Thread Anshuman Khandual


On 3/9/21 2:03 PM, Anshuman Khandual wrote:
> This series contains config cleanup patches which reduces code duplication
> across platforms and also improves maintainability. There is no functional
> change intended with this series. This has been boot tested on arm64 but
> only build tested on some other platforms.
> 
> This applies on 5.12-rc2
> 
> Cc: x...@kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> 
> Anshuman Khandual (6):
>   mm: Generalize ARCH_HAS_CACHE_LINE_SIZE
>   mm: Generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
>   mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
>   mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
>   mm: Drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
>   mm: Drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

Again the same thing happened.

https://patchwork.kernel.org/project/linux-mm/list/?series=444393
https://lore.kernel.org/linux-mm/1615278790-18053-1-git-send-email-anshuman.khand...@arm.com/

>From past experiences, this problem might be just related to many
entries on the CC list. But this time even dropped the --cc-cover
parameter which would have expanded the CC list on each individual
patches further, like last time.

If it helps, have hosted these six patches on v5.12-rc2

https://gitlab.arm.com/linux-arm/linux-anshuman/-/commits/mm/mm_config_cleanups/v1/

- Anshuman


[PATCH 4/6] mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION

2021-03-09 Thread Anshuman Khandual
ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
platforms that subscribe them. Drop these reduntant definitions and instead
just select them appropriately.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig | 10 ++
 arch/powerpc/platforms/Kconfig.cputype |  5 +
 arch/x86/Kconfig   | 10 ++
 3 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 67e904b0f32a..c0e75f62f08c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,8 +11,10 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
+   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -1903,14 +1905,6 @@ config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on TRANSPARENT_HUGEPAGE
-
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index cec1017813f8..4465b71b2bff 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -96,6 +96,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_NUMA_BALANCING
@@ -420,10 +421,6 @@ config PPC_PKEY
depends on PPC_BOOK3S_64
depends on PPC_MEM_KEYS || PPC_KUAP || PPC_KUEP
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
-
 
 config PPC_MMU_NOHASH
def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 503d8b2e8676..10702ef1eb57 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,8 +60,10 @@ config X86
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if x86_64 && HUGETLB_PAGE && 
MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
+   select ARCH_ENABLE_THP_MIGRATION if x86_64 && TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE  if ACPI
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
@@ -2433,14 +2435,6 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
depends on X86_64 || X86_PAE
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on X86_64 && HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on X86_64 && TRANSPARENT_HUGEPAGE
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
-- 
2.20.1



[PATCH 3/6] mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]

2021-03-09 Thread Anshuman Khandual
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions
on platforms that subscribe them. Instead, just make them generic options
which can be selected on applicable platforms.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig   |  8 ++--
 arch/ia64/Kconfig|  8 ++--
 arch/powerpc/Kconfig |  8 ++--
 arch/s390/Kconfig|  8 ++--
 arch/sh/Kconfig  |  2 ++
 arch/sh/mm/Kconfig   |  8 
 arch/x86/Kconfig | 10 ++
 mm/Kconfig   |  6 ++
 8 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 68fe3b5bf17a..67e904b0f32a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,8 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -305,12 +307,6 @@ config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SMP
def_bool y
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..96ce53ad5c9d 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -13,6 +13,8 @@ config IA64
select ARCH_MIGHT_HAVE_PC_SERIO
select ACPI
select ACPI_NUMA if NUMA
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_SUPPORTS_ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
@@ -250,12 +252,6 @@ config HOTPLUG_CPU
  can be controlled through /sys/devices/system/cpu/cpu#.
  Say N if you want to disable CPU hotplug.
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SCHED_SMT
bool "SMT scheduler support"
depends on SMP
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a74c211e55b1..02a05a24659d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -118,6 +118,8 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_32BIT_OFF_T if PPC32
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
@@ -515,12 +517,6 @@ config ARCH_CPU_PROBE_RELEASE
def_bool y
depends on HOTPLUG_CPU
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config PPC64_SUPPORTS_MEMORY_FAILURE
bool "Add support for memory hwpoison"
depends on PPC_BOOK3S_64
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c1ff874e6c2e..f8b356550daa 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -60,6 +60,8 @@ config S390
imply IMA_SECURE_AND_OR_TRUSTED_BOOT
select ARCH_32BIT_USTAT_F_TINODE
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_DEVMEM_IS_ALLOWED
@@ -626,12 +628,6 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y if SPARSEMEM
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
 
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index a54b0c5de37b..68129537e350 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -2,6 +2,8 @@
 config SUPERH
def_bool y
select ARCH_32BIT_OFF_T
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM && MMU
+   select ARCH_ENABLE_MEMORY_HOTREMOVE if SPARSEMEM && MMU
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_BINFMT_FLAT if !MMU
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 77aa2f802d8d..d551a9cac41e 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@

[PATCH 0/6] mm: some config cleanups

2021-03-09 Thread Anshuman Khandual
This series contains config cleanup patches which reduces code duplication
across platforms and also improves maintainability. There is no functional
change intended with this series. This has been boot tested on arm64 but
only build tested on some other platforms.

This applies on 5.12-rc2

Cc: x...@kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (6):
  mm: Generalize ARCH_HAS_CACHE_LINE_SIZE
  mm: Generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
  mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
  mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
  mm: Drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
  mm: Drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

 arch/arc/Kconfig   |  9 ++--
 arch/arm/Kconfig   | 10 ++---
 arch/arm64/Kconfig | 30 ++
 arch/ia64/Kconfig  |  8 ++-
 arch/mips/Kconfig  |  6 +-
 arch/parisc/Kconfig|  5 +
 arch/powerpc/Kconfig   | 11 ++
 arch/powerpc/platforms/Kconfig.cputype | 16 +-
 arch/riscv/Kconfig |  5 +
 arch/s390/Kconfig  | 12 +++
 arch/sh/Kconfig|  7 +++---
 arch/sh/mm/Kconfig |  8 ---
 arch/x86/Kconfig   | 29 ++---
 fs/Kconfig |  5 -
 mm/Kconfig |  9 
 15 files changed, 48 insertions(+), 122 deletions(-)

-- 
2.20.1



Re: [PATCH 0/6] mm: some config cleanups

2021-03-08 Thread Anshuman Khandual
On 3/8/21 12:11 PM, Anshuman Khandual wrote:
> This series contains config cleanup patches which reduces code duplication
> across platforms and also improves maintainability. There is no functional
> change intended with this series. This has been boot tested on arm64 but
> only build tested on some other platforms.
> 
> This applies on 5.12-rc2
> 
> Cc: x...@kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> 
> Anshuman Khandual (6):
>   mm: Generalize ARCH_HAS_CACHE_LINE_SIZE
>   mm: Generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
>   mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
>   mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
>   mm: Drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
>   mm: Drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

Seems like there was a problem during the email because some patches
might not have hit the mailing list. Although git send-email never
really reported any problem. Not sure what happened here.

https://patchwork.kernel.org/project/linux-mm/list/?series=443619
https://lore.kernel.org/linux-mm/1615185706-24342-1-git-send-email-anshuman.khand...@arm.com/

Will probably resend the series.

- Anshuman


[PATCH 4/6] mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION

2021-03-07 Thread Anshuman Khandual
ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
platforms that subscribe them. Drop these reduntant definitions and instead
just select them appropriately.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig | 10 ++
 arch/powerpc/platforms/Kconfig.cputype |  5 +
 arch/x86/Kconfig   | 10 ++
 3 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 67e904b0f32a..c0e75f62f08c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,8 +11,10 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
+   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -1903,14 +1905,6 @@ config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on TRANSPARENT_HUGEPAGE
-
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index cec1017813f8..4465b71b2bff 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -96,6 +96,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_NUMA_BALANCING
@@ -420,10 +421,6 @@ config PPC_PKEY
depends on PPC_BOOK3S_64
depends on PPC_MEM_KEYS || PPC_KUAP || PPC_KUEP
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
-
 
 config PPC_MMU_NOHASH
def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 503d8b2e8676..10702ef1eb57 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,8 +60,10 @@ config X86
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
+   select ARCH_ENABLE_HUGEPAGE_MIGRATION if x86_64 && HUGETLB_PAGE && 
MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
+   select ARCH_ENABLE_THP_MIGRATION if x86_64 && TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE  if ACPI
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
@@ -2433,14 +2435,6 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
depends on X86_64 || X86_PAE
 
-config ARCH_ENABLE_HUGEPAGE_MIGRATION
-   def_bool y
-   depends on X86_64 && HUGETLB_PAGE && MIGRATION
-
-config ARCH_ENABLE_THP_MIGRATION
-   def_bool y
-   depends on X86_64 && TRANSPARENT_HUGEPAGE
-
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
-- 
2.20.1



[PATCH 3/6] mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]

2021-03-07 Thread Anshuman Khandual
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions
on platforms that subscribe them. Instead, just make them generic options
which can be selected on applicable platforms.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig   |  8 ++--
 arch/ia64/Kconfig|  8 ++--
 arch/powerpc/Kconfig |  8 ++--
 arch/s390/Kconfig|  8 ++--
 arch/sh/Kconfig  |  2 ++
 arch/sh/mm/Kconfig   |  8 
 arch/x86/Kconfig | 10 ++
 mm/Kconfig   |  6 ++
 8 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 68fe3b5bf17a..67e904b0f32a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,8 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -305,12 +307,6 @@ config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SMP
def_bool y
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..96ce53ad5c9d 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -13,6 +13,8 @@ config IA64
select ARCH_MIGHT_HAVE_PC_SERIO
select ACPI
select ACPI_NUMA if NUMA
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_SUPPORTS_ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
@@ -250,12 +252,6 @@ config HOTPLUG_CPU
  can be controlled through /sys/devices/system/cpu/cpu#.
  Say N if you want to disable CPU hotplug.
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config SCHED_SMT
bool "SMT scheduler support"
depends on SMP
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a74c211e55b1..02a05a24659d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -118,6 +118,8 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_32BIT_OFF_T if PPC32
+   select ARCH_ENABLE_MEMORY_HOTPLUG
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
@@ -515,12 +517,6 @@ config ARCH_CPU_PROBE_RELEASE
def_bool y
depends on HOTPLUG_CPU
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config PPC64_SUPPORTS_MEMORY_FAILURE
bool "Add support for memory hwpoison"
depends on PPC_BOOK3S_64
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c1ff874e6c2e..f8b356550daa 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -60,6 +60,8 @@ config S390
imply IMA_SECURE_AND_OR_TRUSTED_BOOT
select ARCH_32BIT_USTAT_F_TINODE
select ARCH_BINFMT_ELF_STATE
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM
+   select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_DEVMEM_IS_ALLOWED
@@ -626,12 +628,6 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y
 
-config ARCH_ENABLE_MEMORY_HOTPLUG
-   def_bool y if SPARSEMEM
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-   def_bool y
-
 config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
 
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index a54b0c5de37b..68129537e350 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -2,6 +2,8 @@
 config SUPERH
def_bool y
select ARCH_32BIT_OFF_T
+   select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM && MMU
+   select ARCH_ENABLE_MEMORY_HOTREMOVE if SPARSEMEM && MMU
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_BINFMT_FLAT if !MMU
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 77aa2f802d8d..d551a9cac41e 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@

[PATCH 0/6] mm: some config cleanups

2021-03-07 Thread Anshuman Khandual
This series contains config cleanup patches which reduces code duplication
across platforms and also improves maintainability. There is no functional
change intended with this series. This has been boot tested on arm64 but
only build tested on some other platforms.

This applies on 5.12-rc2

Cc: x...@kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (6):
  mm: Generalize ARCH_HAS_CACHE_LINE_SIZE
  mm: Generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
  mm: Generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
  mm: Drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
  mm: Drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
  mm: Drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

 arch/arc/Kconfig   |  9 ++--
 arch/arm/Kconfig   | 10 ++---
 arch/arm64/Kconfig | 30 ++
 arch/ia64/Kconfig  |  8 ++-
 arch/mips/Kconfig  |  6 +-
 arch/parisc/Kconfig|  5 +
 arch/powerpc/Kconfig   | 11 ++
 arch/powerpc/platforms/Kconfig.cputype | 16 +-
 arch/riscv/Kconfig |  5 +
 arch/s390/Kconfig  | 12 +++
 arch/sh/Kconfig|  7 +++---
 arch/sh/mm/Kconfig |  8 ---
 arch/x86/Kconfig   | 29 ++---
 fs/Kconfig |  5 -
 mm/Kconfig |  9 
 15 files changed, 48 insertions(+), 122 deletions(-)

-- 
2.20.1



[RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes

2021-03-07 Thread Anshuman Khandual
Platforms like arm and arm64 have redefined pfn_valid() because their early
memory sections might have contained memmap holes caused by memblock areas
tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
for struct page backing. This scenario could be captured with a new option
CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
improved to accommodate such platforms. This reduces overall code footprint
and also improves maintainability.

Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
turn had expanded its scope to new platforms like arc and m68k. Rather lets
restrict back the scope for free_unused_memmap() to arm and arm64 platforms
using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.

While here, it exports the symbol memblock_is_map_memory() to build drivers
that depend on pfn_valid() but does not have the required visibility. After
this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
arm and arm64 platforms.

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: David Hildenbrand 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Suggested-by: David Hildenbrand 
Signed-off-by: Anshuman Khandual 
---
This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
has been lightly tested on the arm64 platform. The idea to represent this
unique situation on the arm and arm64 platforms with a config option was
proposed by David H during an earlier discussion [2]. This still does not
build on arm platform due to pfn_valid() resolution errors. Nonetheless
wanted to get some early feedback whether the overall approach here, is
acceptable or not.

[1] https://patchwork.kernel.org/project/linux-mm/list/?series=442433 
[2] 
https://lore.kernel.org/linux-arm-kernel/4b282848-d2d7-6156-4726-ce974b2df...@redhat.com/

 arch/arm/Kconfig  |  2 +-
 arch/arm/include/asm/page.h   |  4 
 arch/arm/mm/init.c| 13 ---
 arch/arm64/Kconfig|  2 +-
 arch/arm64/include/asm/page.h |  2 --
 arch/arm64/mm/init.c  | 41 ---
 include/linux/mmzone.h| 24 +++-
 mm/Kconfig| 10 +
 mm/memblock.c |  3 ++-
 9 files changed, 37 insertions(+), 64 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 853aab5ab327..8b1d3089baa6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -71,7 +71,6 @@ config ARM
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
select HAVE_ARCH_MMAP_RND_BITS if MMU
-   select HAVE_ARCH_PFN_VALID
select HAVE_ARCH_SECCOMP
select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
@@ -84,6 +83,7 @@ config ARM
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
+   select HAVE_EARLY_SECTION_MEMMAP_HOLES
select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) 
&& MMU
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP if ARM_LPAE
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 11b058a72a5b..7e3189083bd7 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -153,10 +153,6 @@ extern void copy_page(void *to, const void *from);
 
 typedef struct page *pgtable_t;
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
-#endif
-
 #include 
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 828a2561b229..9131ef4e599e 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -121,19 +121,6 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max_low,
free_area_init(max_zone_pfn);
 }
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-int pfn_valid(unsigned long pfn)
-{
-   phys_addr_t addr = __pfn_to_phys(pfn);
-
-   if (__phys_to_pfn(addr) != pfn)
-   return 0;
-
-   return memblock_is_map_memory(addr);
-}
-EXPORT_SYMBOL(pfn_valid);
-#endif
-
 static bool arm_memblock_steal_permitted = true;
 
 phys_addr_t __init arm_memblock_steal(phys_addr_t size, phys_addr_t align)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1f212b47a48a..2ee48bdf9dc1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -144,7 +144,6 @@ config ARM64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
-   select HAVE_ARCH_PFN_VALID
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_

Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-07 Thread Anshuman Khandual



On 3/5/21 8:21 PM, Mark Rutland wrote:
> On Fri, Mar 05, 2021 at 08:06:09PM +0530, Anshuman Khandual wrote:
>> From: James Morse 
>>
>> As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
>> might contain a range of values to describe supported translation granules
>> (4K and 16K pages sizes in particular) instead of just enabled or disabled
>> values. This changes __enable_mmu() function to handle complete acceptable
>> range of values (depending on whether the field is signed or unsigned) now
>> represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
>> also fix similar situations in EFI stub and KVM as well.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Marc Zyngier 
>> Cc: James Morse 
>> Cc: Suzuki K Poulose 
>> Cc: Ard Biesheuvel 
>> Cc: Mark Rutland 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: kvm...@lists.cs.columbia.edu
>> Cc: linux-...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: James Morse 
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/arm64/include/asm/sysreg.h   | 20 ++--
>>  arch/arm64/kernel/head.S  |  6 --
>>  arch/arm64/kvm/reset.c| 23 ---
>>  drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
>>  4 files changed, 31 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index dfd4edb..d4a5fca9 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -796,6 +796,11 @@
>>  #define ID_AA64MMFR0_PARANGE_48 0x5
>>  #define ID_AA64MMFR0_PARANGE_52 0x6
>>  
>> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT  0x0
>> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE 0x1
>> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN  0x2
>> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX  0x7
>
> The TGRAN2 fields doesn't quite follow the usual ID scheme rules, so how
> do we deteremine the max value? Does the ARM ARM say anything in
> particular about them, like we do for some of the PMU ID fields?

Did not find anything in ARM ARM, regarding what scheme TGRAN2 fields
actually follow. I had arrived at more restrictive 0x7 value, like the
usual signed fields as the TGRAN4 fields definitely do not follow the
unsigned ID scheme. Would restricting max value to 0x3 (i.e LPA2) be a
better option instead ?

> 
> Otherwise, this patch looks correct to me.
> 
> Thanks,
> Mark.
> 


[PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-05 Thread Anshuman Khandual
From: James Morse 

As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Suzuki K Poulose 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvm...@lists.cs.columbia.edu
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: James Morse 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/include/asm/sysreg.h   | 20 ++--
 arch/arm64/kernel/head.S  |  6 --
 arch/arm64/kvm/reset.c| 23 ---
 drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
 4 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index dfd4edb..d4a5fca9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -796,6 +796,11 @@
 #define ID_AA64MMFR0_PARANGE_480x5
 #define ID_AA64MMFR0_PARANGE_520x6
 
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7
+
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define ID_AA64MMFR0_PARANGE_MAX   ID_AA64MMFR0_PARANGE_52
 #else
@@ -961,14 +966,17 @@
 #define ID_PFR1_PROGMOD_SHIFT  0
 
 #if defined(CONFIG_ARM64_4K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #elif defined(CONFIG_ARM64_16K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0xF
 #elif defined(CONFIG_ARM64_64K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #endif
 
 #define MVFR2_FPMISC_SHIFT 4
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b..8b469f1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
 SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
-   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
-   b.ne__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
+   b.lt__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
+   b.gt__no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrpx2, idmap_pg_dir
phys_to_ttbr x1, x1
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f03..fe72bfb 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void)
 
 int kvm_set_ipa_limit(void)
 {
-   unsigned int parange, tgran_2;
+   unsigned int parange, tgran_2_shift, tgran_2;
u64 mmfr0;
 
mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
@@ -300,27 +300,28 @@ int kvm_set_ipa_limit(void)
switch (PAGE_SIZE) {
default:
case SZ_4K:
-   tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN4_2_SHIFT;
break;
case SZ_16K:
-   tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN16_2_SHIFT;
break;
case SZ_64K:
-   tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN64_2_SHIFT;
break;
}
 
-   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
-   default:
-   case 1:
+   tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift);
+   if (tgran_2

Re: [PATCH V3 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-03-04 Thread Anshuman Khandual


On 3/5/21 10:54 AM, Anshuman Khandual wrote:
> This series fixes pfn_valid() for ZONE_DEVICE based memory and also improves
> its performance for normal hotplug memory. While here, it also reorganizes
> pfn_valid() on CONFIG_SPARSEMEM. This series is based on v5.12-rc1.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: James Morse 
> Cc: Robin Murphy 
> Cc: Jérôme Glisse 
> Cc: Dan Williams 
> Cc: David Hildenbrand 
> Cc: Mike Rapoport 
> Cc: Veronika Kabatova 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> 
> Changes in V3:
> 
> - Validate the pfn before fetching mem_section with __pfn_to_section() in 
> [PATCH 2/2]

Hello Veronica,

Could you please help recreate the earlier failure [1] but with this
series applies on v5.12-rc1. Thank you.

[1] 
https://lore.kernel.org/linux-arm-kernel/cki.8d1cb60fec.k6njmef...@redhat.com/

- Anshuman


[PATCH V3 2/2] arm64/mm: Reorganize pfn_valid()

2021-03-04 Thread Anshuman Khandual
There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
when CONFIG_SPARSEMEM is enabled. This can be optimized if memory section
is fetched earlier. This replaces the open coded PFN and ADDR conversion
with PFN_PHYS() and PHYS_PFN() helpers. While there, also add a comment.
This does not cause any functional change.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: David Hildenbrand 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/init.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 5920c527845a..3685e12aba9b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -219,16 +219,26 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 
 int pfn_valid(unsigned long pfn)
 {
-   phys_addr_t addr = pfn << PAGE_SHIFT;
+   phys_addr_t addr = PFN_PHYS(pfn);
 
-   if ((addr >> PAGE_SHIFT) != pfn)
+   /*
+* Ensure the upper PAGE_SHIFT bits are clear in the
+* pfn. Else it might lead to false positives when
+* some of the upper bits are set, but the lower bits
+* match a valid pfn.
+*/
+   if (PHYS_PFN(addr) != pfn)
return 0;
 
 #ifdef CONFIG_SPARSEMEM
+{
+   struct mem_section *ms;
+
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
 
-   if (!valid_section(__pfn_to_section(pfn)))
+   ms = __pfn_to_section(pfn);
+   if (!valid_section(ms))
return 0;
 
/*
@@ -240,8 +250,9 @@ int pfn_valid(unsigned long pfn)
 * memory sections covering all of hotplug memory including
 * both normal and ZONE_DEVICE based.
 */
-   if (!early_section(__pfn_to_section(pfn)))
-   return pfn_section_valid(__pfn_to_section(pfn), pfn);
+   if (!early_section(ms))
+   return pfn_section_valid(ms, pfn);
+}
 #endif
return memblock_is_map_memory(addr);
 }
-- 
2.20.1



[PATCH V3 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-03-04 Thread Anshuman Khandual
pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
hotplugged memory too will not have SECTION_IS_EARLY set for their sections

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Robin Murphy 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: David Hildenbrand 
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/init.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..5920c527845a 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -230,6 +230,18 @@ int pfn_valid(unsigned long pfn)
 
if (!valid_section(__pfn_to_section(pfn)))
return 0;
+
+   /*
+* ZONE_DEVICE memory does not have the memblock entries.
+* memblock_is_map_memory() check for ZONE_DEVICE based
+* addresses will always fail. Even the normal hotplugged
+* memory will never have MEMBLOCK_NOMAP flag set in their
+* memblock entries. Skip memblock search for all non early
+* memory sections covering all of hotplug memory including
+* both normal and ZONE_DEVICE based.
+*/
+   if (!early_section(__pfn_to_section(pfn)))
+   return pfn_section_valid(__pfn_to_section(pfn), pfn);
 #endif
return memblock_is_map_memory(addr);
 }
-- 
2.20.1



[PATCH V3 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-03-04 Thread Anshuman Khandual
This series fixes pfn_valid() for ZONE_DEVICE based memory and also improves
its performance for normal hotplug memory. While here, it also reorganizes
pfn_valid() on CONFIG_SPARSEMEM. This series is based on v5.12-rc1.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: James Morse 
Cc: Robin Murphy 
Cc: Jérôme Glisse 
Cc: Dan Williams 
Cc: David Hildenbrand 
Cc: Mike Rapoport 
Cc: Veronika Kabatova 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

Changes in V3:

- Validate the pfn before fetching mem_section with __pfn_to_section() in 
[PATCH 2/2]

Changes in V2:

https://lore.kernel.org/linux-mm/1612239114-28428-1-git-send-email-anshuman.khand...@arm.com/

- Dropped pfn_valid() bifurcation based on CONFIG_SPARSEMEM
- Used PFN_PHYS() and PHYS_PFN() instead of __pfn_to_phys() and __phys_to_pfn()
- Moved __pfn_to_section() inside #ifdef CONFIG_SPARSEMEM with a { } construct

Changes in V1:

https://lore.kernel.org/linux-mm/1611905986-20155-1-git-send-email-anshuman.khand...@arm.com/

- Test pfn_section_valid() for non boot memory

Changes in RFC:

https://lore.kernel.org/linux-arm-kernel/1608621144-4001-1-git-send-email-anshuman.khand...@arm.com/

Anshuman Khandual (2):
  arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  arm64/mm: Reorganize pfn_valid()

 arch/arm64/mm/init.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

-- 
2.20.1



Re: [PATCH V2 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-03-04 Thread Anshuman Khandual



On 3/4/21 3:06 PM, Will Deacon wrote:
> On Thu, Mar 04, 2021 at 09:12:31AM +0100, David Hildenbrand wrote:
>> On 04.03.21 04:31, Anshuman Khandual wrote:
>>> On 3/4/21 2:54 AM, Will Deacon wrote:
>>>> On Wed, Mar 03, 2021 at 07:04:33PM +, Catalin Marinas wrote:
>>>>> On Thu, Feb 11, 2021 at 01:35:56PM +0100, David Hildenbrand wrote:
>>>>>> On 11.02.21 13:10, Anshuman Khandual wrote:
>>>>>>> On 2/11/21 5:23 PM, Will Deacon wrote:
>>>>>>>> ... and dropped. These patches appear to be responsible for a boot
>>>>>>>> regression reported by CKI:
>>>>>>>
>>>>>>> Ahh, boot regression ? These patches only change the behaviour
>>>>>>> for non boot memory only.
>>>>>>>
>>>>>>>> https://lore.kernel.org/r/cki.8d1cb60fec.k6njmef...@redhat.com
>>>>>>>
>>>>>>> Will look into the logs and see if there is something pointing to
>>>>>>> the problem.
>>>>>>
>>>>>> It's strange. One thing I can imagine is a mis-detection of early 
>>>>>> sections.
>>>>>> However, I don't see that happening:
>>>>>>
>>>>>> In sparse_init_nid(), we:
>>>>>> 1. Initialize the memmap
>>>>>> 2. Set SECTION_IS_EARLY | SECTION_HAS_MEM_MAP via
>>>>>> sparse_init_one_section()
>>>>>>
>>>>>> Only hotplugged sections (DIMMs, dax/kmem) set SECTION_HAS_MEM_MAP 
>>>>>> without
>>>>>> SECTION_IS_EARLY - which is correct, because these are not early.
>>>>>>
>>>>>> So once we know that we have valid_section() -- SECTION_HAS_MEM_MAP is 
>>>>>> set
>>>>>> -- early_section() should be correct.
>>>>>>
>>>>>> Even if someone would be doing a pfn_valid() after
>>>>>> memblocks_present()->memory_present() but before
>>>>>> sparse_init_nid(), we should be fine (!valid_section() -> return 0).
>>>>>
>>>>> I couldn't figure out how this could fail with Anshuman's patches.
>>>>> Will's suspicion is that some invalid/null pointer gets dereferenced
>>>>> before being initialised but the only case I see is somewhere in
>>>>> pfn_section_valid() (ms->usage) if valid_section() && !early_section().
>>>>>
>>>>> Assuming that we do get a valid_section(ms) && !early_section(ms), is
>>>>> there a case where ms->usage is not initialised? I guess races with
>>>>> section_deactivate() are not possible this early.
>>>>>
>>>>> Another situation could be that pfn_valid() returns true when no memory
>>>>> is mapped for that pfn.
>>>>
>>>> The case I wondered about was __pfn_to_section() with a bogus pfn, since
>>>> with patch 2/2 we call that *before* checking that pfn_to_section_nr() is
>>>> sane.
>>>
>>> Right, that is problematic. __pfn_to_section() should not be called without
>>> first validating pfn_to_section_nr(), as it could cause out-of-bound access
>>> on mem_section buffer. Will fix that order but as there is no test scenario
>>> which is definitive for this reported regression, how should we ensure that
>>> it fixes the problem ?
>>
>> Oh, right, I missed that in patch #2. (and when comparing to generic
>> pfn_valid()).
>>
>> I thought bisecting pointed at patch #1, that's why I didn't even have
>> another look at patch #2. Makes sense.
> 
> I don't think we ever bisected it beyond these two patches, so it could
> be either of them. Anshuman -- please work with Veronika on this, as she
> has access to the problematic machine and was really helpful in debugging
> this last time.

Sure, will respin the patch series with a fix for [PATCH 2/2] as discussed
and then follow up with Veronika to recreate the problem.


[PATCH V3] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-04 Thread Anshuman Khandual
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
platform subscribing it. Instead just make it generic.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: Christophe Leroy 
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Christoph Hellwig 
Signed-off-by: Anshuman Khandual 
---
This change was originally suggested in an earilier discussion. This
applies on v5.12-rc1 and has been build tested on all applicable
platforms i.e ia64 and powerpc.

https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/

Changes in V3:

- Dropped the bool desciption that enabled user selection
- Dropped the dependency on HUGETLB_PAGE for HUGETLB_PAGE_SIZE_VARIABLE

Changes in V2:

https://patchwork.kernel.org/project/linux-mm/patch/1614661987-23881-1-git-send-email-anshuman.khand...@arm.com/

- Added a description for HUGETLB_PAGE_SIZE_VARIABLE
- Added HUGETLB_PAGE dependency while selecting HUGETLB_PAGE_SIZE_VARIABLE

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1614577853-7452-1-git-send-email-anshuman.khand...@arm.com/

 arch/ia64/Kconfig| 6 +-
 arch/powerpc/Kconfig | 6 +-
 mm/Kconfig   | 7 +++
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..dccf5bfebf48 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -32,6 +32,7 @@ config IA64
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
+   select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -82,11 +83,6 @@ config STACKTRACE_SUPPORT
 config GENERIC_LOCKBREAK
def_bool n
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE
-   default y
-
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..11fea95a1f2c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -231,6 +231,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_PAGE_SIZE
select HAVE_REGS_AND_STACK_ACCESS_API
@@ -415,11 +416,6 @@ config HIGHMEM
 
 source "kernel/Kconfig.hz"
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE && PPC_BOOK3S_64
-   default y
-
 config MATH_EMULATION
bool "Math emulation"
depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..4413a69e7850 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -274,6 +274,13 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config HUGETLB_PAGE_SIZE_VARIABLE
+   def_bool n
+   help
+ Allows the pageblock_order value to be dynamic instead of just 
standard
+ HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes 
available
+ on a platform.
+
 config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-- 
2.20.1



Re: [PATCH V2] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-03 Thread Anshuman Khandual



On 3/2/21 10:43 AM, Anshuman Khandual wrote:
> HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
> platform subscribing it. Instead just make it generic.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Andrew Morton 
> Cc: Christoph Hellwig 
> Cc: linux-i...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Anshuman Khandual 
> ---
> This change was originally suggested in an earilier discussion. This
> applies on v5.12-rc1 and has been build tested on all applicable
> platforms i.e ia64 and powerpc.
> 
> https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/
> 
> Changes in V2:
> 
> - Added a description for HUGETLB_PAGE_SIZE_VARIABLE
> - Added HUGETLB_PAGE dependency while selecting HUGETLB_PAGE_SIZE_VARIABLE
> 
> Changes in V1:
> 
> https://patchwork.kernel.org/project/linux-mm/patch/1614577853-7452-1-git-send-email-anshuman.khand...@arm.com/
> 
>  arch/ia64/Kconfig| 6 +-
>  arch/powerpc/Kconfig | 6 +-
>  mm/Kconfig   | 9 +
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 2ad7a8d29fcc..dccf5bfebf48 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -32,6 +32,7 @@ config IA64
>   select TTY
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_VIRT_CPU_ACCOUNTING
> + select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE
>   select VIRT_TO_BUS
>   select GENERIC_IRQ_PROBE
>   select GENERIC_PENDING_IRQ if SMP
> @@ -82,11 +83,6 @@ config STACKTRACE_SUPPORT
>  config GENERIC_LOCKBREAK
>   def_bool n
>  
> -config HUGETLB_PAGE_SIZE_VARIABLE
> - bool
> - depends on HUGETLB_PAGE
> - default y
> -
>  config GENERIC_CALIBRATE_DELAY
>   bool
>   default y
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 3778ad17f56a..3fdec3e53256 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -232,6 +232,7 @@ config PPC
>   select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
> HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
>   select HAVE_PERF_REGS
>   select HAVE_PERF_USER_STACK_DUMP
> + select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
>   select MMU_GATHER_RCU_TABLE_FREE
>   select MMU_GATHER_PAGE_SIZE
>   select HAVE_REGS_AND_STACK_ACCESS_API
> @@ -416,11 +417,6 @@ config HIGHMEM
>  
>  source "kernel/Kconfig.hz"
>  
> -config HUGETLB_PAGE_SIZE_VARIABLE
> - bool
> - depends on HUGETLB_PAGE && PPC_BOOK3S_64
> - default y
> -
>  config MATH_EMULATION
>   bool "Math emulation"
>   depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 24c045b24b95..64f1e0503e4f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -274,6 +274,15 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
>  config ARCH_ENABLE_THP_MIGRATION
>   bool
>  
> +config HUGETLB_PAGE_SIZE_VARIABLE
> + bool "Allows dynamic pageblock_order"
> + def_bool n
> + depends on HUGETLB_PAGE

Seems like this dependency on HUGETLB_PAGE is redundant, as it is
already being ensured on the platforms while selecting the config.

> + help
> +   Allows the pageblock_order value to be dynamic instead of just 
> standard
> +   HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes 
> available
> +   on a platform.
> +
>  config CONTIG_ALLOC
>   def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
>  
> 


Re: [PATCH V2 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

2021-03-03 Thread Anshuman Khandual



On 3/4/21 2:54 AM, Will Deacon wrote:
> On Wed, Mar 03, 2021 at 07:04:33PM +, Catalin Marinas wrote:
>> On Thu, Feb 11, 2021 at 01:35:56PM +0100, David Hildenbrand wrote:
>>> On 11.02.21 13:10, Anshuman Khandual wrote:
>>>> On 2/11/21 5:23 PM, Will Deacon wrote:
>>>>> ... and dropped. These patches appear to be responsible for a boot
>>>>> regression reported by CKI:
>>>>
>>>> Ahh, boot regression ? These patches only change the behaviour
>>>> for non boot memory only.
>>>>
>>>>> https://lore.kernel.org/r/cki.8d1cb60fec.k6njmef...@redhat.com
>>>>
>>>> Will look into the logs and see if there is something pointing to
>>>> the problem.
>>>
>>> It's strange. One thing I can imagine is a mis-detection of early sections.
>>> However, I don't see that happening:
>>>
>>> In sparse_init_nid(), we:
>>> 1. Initialize the memmap
>>> 2. Set SECTION_IS_EARLY | SECTION_HAS_MEM_MAP via
>>>sparse_init_one_section()
>>>
>>> Only hotplugged sections (DIMMs, dax/kmem) set SECTION_HAS_MEM_MAP without
>>> SECTION_IS_EARLY - which is correct, because these are not early.
>>>
>>> So once we know that we have valid_section() -- SECTION_HAS_MEM_MAP is set
>>> -- early_section() should be correct.
>>>
>>> Even if someone would be doing a pfn_valid() after
>>> memblocks_present()->memory_present() but before
>>> sparse_init_nid(), we should be fine (!valid_section() -> return 0).
>>
>> I couldn't figure out how this could fail with Anshuman's patches.
>> Will's suspicion is that some invalid/null pointer gets dereferenced
>> before being initialised but the only case I see is somewhere in
>> pfn_section_valid() (ms->usage) if valid_section() && !early_section().
>>
>> Assuming that we do get a valid_section(ms) && !early_section(ms), is
>> there a case where ms->usage is not initialised? I guess races with
>> section_deactivate() are not possible this early.
>>
>> Another situation could be that pfn_valid() returns true when no memory
>> is mapped for that pfn.
> 
> The case I wondered about was __pfn_to_section() with a bogus pfn, since
> with patch 2/2 we call that *before* checking that pfn_to_section_nr() is
> sane.

Right, that is problematic. __pfn_to_section() should not be called without
first validating pfn_to_section_nr(), as it could cause out-of-bound access
on mem_section buffer. Will fix that order but as there is no test scenario
which is definitive for this reported regression, how should we ensure that
it fixes the problem ?


Re: [PATCH v4 17/19] coresight: core: Add support for dedicated percpu sinks

2021-03-02 Thread Anshuman Khandual



On 3/1/21 7:24 PM, Suzuki K Poulose wrote:
> On 2/26/21 6:34 AM, kernel test robot wrote:
>> Hi Suzuki,
>>
>> Thank you for the patch! Yet something to improve:
>>
>> [auto build test ERROR on linus/master]
>> [also build test ERROR on next-20210226]
>> [cannot apply to kvmarm/next arm64/for-next/core tip/perf/core v5.11]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url:    
>> https://github.com/0day-ci/linux/commits/Suzuki-K-Poulose/arm64-coresight-Add-support-for-ETE-and-TRBE/20210226-035447
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>> 6fbd6cf85a3be127454a1ad58525a3adcf8612ab
>> config: arm-randconfig-r024-20210225 (attached as .config)
>> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
>> a921aaf789912d981cbb2036bdc91ad7289e1523)
>> reproduce (this is a W=1 build):
>>  wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
>> ~/bin/make.cross
>>  chmod +x ~/bin/make.cross
>>  # install arm cross compiling tool for clang build
>>  # apt-get install binutils-arm-linux-gnueabi
>>  # 
>> https://github.com/0day-ci/linux/commit/c37564326cdf11e0839eae06c1bfead47d3e5775
>>  git remote add linux-review https://github.com/0day-ci/linux
>>  git fetch --no-tags linux-review 
>> Suzuki-K-Poulose/arm64-coresight-Add-support-for-ETE-and-TRBE/20210226-035447
>>  git checkout c37564326cdf11e0839eae06c1bfead47d3e5775
>>  # save the attached .config to linux build tree
>>  COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot 
> 
> Thanks for the report. The following fixup should clear this :
> 
> 
> ---8>---
> 
> 
> 
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 8a3a3c199087..85008a65e21f 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -429,6 +429,33 @@ static inline void csdev_access_write64(struct 
> csdev_access *csa, u64 val, u32 o
>  csa->write(val, offset, false, true);
>  }
> 
> +#else    /* !CONFIG_64BIT */
> +
> +static inline u64 csdev_access_relaxed_read64(struct csdev_access *csa,
> +  u32 offset)
> +{
> +    WARN_ON(1);
> +    return 0;
> +}
> +
> +static inline u64 csdev_access_read64(struct csdev_access *csa, u32 offset)
> +{
> +    WARN_ON(1);
> +    return 0;
> +}
> +
> +static inline void csdev_access_relaxed_write64(struct csdev_access *csa,
> +    u64 val, u32 offset)
> +{
> +    WARN_ON(1);
> +}
> +
> +static inline void csdev_access_write64(struct csdev_access *csa, u64 val, 
> u32 offset)
> +{
> +    WARN_ON(1);
> +}
> +#endif    /* CONFIG_64BIT */
> +
>  static inline bool coresight_is_percpu_source(struct coresight_device *csdev)
>  {
>  return csdev && (csdev->type == CORESIGHT_DEV_TYPE_SOURCE) &&
> @@ -440,32 +467,6 @@ static inline bool coresight_is_percpu_sink(struct 
> coresight_device *csdev)
>  return csdev && (csdev->type == CORESIGHT_DEV_TYPE_SINK) &&
>     (csdev->subtype.sink_subtype == 
> CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM);
>  }
> -#else    /* !CONFIG_64BIT */
> -
> -static inline u64 csdev_access_relaxed_read64(struct csdev_access *csa,
> -  u32 offset)
> -{
> -    WARN_ON(1);
> -    return 0;
> -}
> -
> -static inline u64 csdev_access_read64(struct csdev_access *csa, u32 offset)
> -{
> -    WARN_ON(1);
> -    return 0;
> -}
> -
> -static inline void csdev_access_relaxed_write64(struct csdev_access *csa,
> -    u64 val, u32 offset)
> -{
> -    WARN_ON(1);
> -}
> -
> -static inline void csdev_access_write64(struct csdev_access *csa, u64 val, 
> u32 offset)
> -{
> -    WARN_ON(1);
> -}
> -#endif    /* CONFIG_64BIT */
> 
>  extern struct coresight_device *
>  coresight_register(struct coresight_desc *desc);

Agreed, these new helpers should be available in general and not restricted for 
64BIT.


Re: [PATCH V2] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-02 Thread Anshuman Khandual



On 3/2/21 11:13 AM, Christophe Leroy wrote:
> 
> 
> Le 02/03/2021 à 06:13, Anshuman Khandual a écrit :
>> HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
>> platform subscribing it. Instead just make it generic.
>>
>> Cc: Michael Ellerman 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Andrew Morton 
>> Cc: Christoph Hellwig 
>> Cc: linux-i...@vger.kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux...@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Suggested-by: Christoph Hellwig 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> This change was originally suggested in an earilier discussion. This
>> applies on v5.12-rc1 and has been build tested on all applicable
>> platforms i.e ia64 and powerpc.
>>
>> https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/
>>
>> Changes in V2:
>>
>> - Added a description for HUGETLB_PAGE_SIZE_VARIABLE
> 
> You are doing more than adding a description: you are making it user 
> selectable. Is that what you want ?

No, this was unintended. Will drop that description.


[PATCH V2] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-02 Thread Anshuman Khandual
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
platform subscribing it. Instead just make it generic.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Christoph Hellwig 
Signed-off-by: Anshuman Khandual 
---
This change was originally suggested in an earilier discussion. This
applies on v5.12-rc1 and has been build tested on all applicable
platforms i.e ia64 and powerpc.

https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/

Changes in V2:

- Added a description for HUGETLB_PAGE_SIZE_VARIABLE
- Added HUGETLB_PAGE dependency while selecting HUGETLB_PAGE_SIZE_VARIABLE

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1614577853-7452-1-git-send-email-anshuman.khand...@arm.com/

 arch/ia64/Kconfig| 6 +-
 arch/powerpc/Kconfig | 6 +-
 mm/Kconfig   | 9 +
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..dccf5bfebf48 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -32,6 +32,7 @@ config IA64
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
+   select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -82,11 +83,6 @@ config STACKTRACE_SUPPORT
 config GENERIC_LOCKBREAK
def_bool n
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE
-   default y
-
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3778ad17f56a..3fdec3e53256 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -232,6 +232,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_PAGE_SIZE
select HAVE_REGS_AND_STACK_ACCESS_API
@@ -416,11 +417,6 @@ config HIGHMEM
 
 source "kernel/Kconfig.hz"
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE && PPC_BOOK3S_64
-   default y
-
 config MATH_EMULATION
bool "Math emulation"
depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..64f1e0503e4f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -274,6 +274,15 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config HUGETLB_PAGE_SIZE_VARIABLE
+   bool "Allows dynamic pageblock_order"
+   def_bool n
+   depends on HUGETLB_PAGE
+   help
+ Allows the pageblock_order value to be dynamic instead of just 
standard
+ HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes 
available
+ on a platform.
+
 config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-- 
2.20.1



[PATCH] arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER

2021-03-01 Thread Anshuman Khandual
Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets
reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K
page size configurations. This is problematic which throws up the following
warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds
MAX_ORDER.

WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70
Modules linked in:
CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-5-g0221e3101a1 #237
Hardware name: linux,dummy-virt (DT)
pstate: 2045 (nzCv daif +PAN -UAO -TCO BTYPE=--)
pc : __fragmentation_index+0x58/0x70
lr : fragmentation_index+0x88/0xa8
sp : 800016ccfc00
x29: 800016ccfc00 x28:  
x27: 800011fd4000 x26: 0002 
x25: 800016ccfda0 x24: 0002 
x23: 0640 x22: 0005ffcb5b18 
x21: 0002 x20: 000d 
x19: 0005ffcb3980 x18: 0004 
x17: 0001 x16: 0019 
x15: 800011ca7fb8 x14: 02b3 
x13:  x12: 05e0 
x11: 0003 x10: 0080 
x9 : 800011c93948 x8 :  
x7 :  x6 : 7000 
x5 : 7944 x4 : 0032 
x3 : 001c x2 : 000b 
x1 : 800016ccfc10 x0 : 000d 
Call trace:
__fragmentation_index+0x58/0x70
compaction_suitable+0x58/0x78
wakeup_kcompactd+0x8c/0xd8
balance_pgdat+0x570/0x5d0
kswapd+0x1e0/0x388
kthread+0x154/0x158
ret_from_fork+0x10/0x30

This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or
without THP on 16K and 64K page size configurations, making sure that the
HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
This applies on v5.12-rc1 and does not seem to have any obvious problem
on 16K and 64K page size configurations. This is a simpler alternate to
a previous series [1] which tried to solve the very same problem but in
a different way.

https://patchwork.kernel.org/project/linux-mm/list/?series=431973

 arch/arm64/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9cd33c7be429..d4690326274a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1156,8 +1156,8 @@ config XEN
 
 config FORCE_MAX_ZONEORDER
int
-   default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
-   default "12" if (ARM64_16K_PAGES && TRANSPARENT_HUGEPAGE)
+   default "14" if ARM64_64K_PAGES
+   default "12" if ARM64_16K_PAGES
default "11"
help
  The kernel memory allocator divides physically contiguous memory
-- 
2.20.1



Re: [PATCH] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-01 Thread Anshuman Khandual



On 3/1/21 1:23 PM, Christoph Hellwig wrote:
> On Mon, Mar 01, 2021 at 01:13:41PM +0530, Anshuman Khandual wrote:
>>> doesn't this need a 'if HUGETLB_PAGE'
>>
>> While making HUGETLB_PAGE_SIZE_VARIABLE a generic option, also made it
>> dependent on HUGETLB_PAGE. Should not that gate HUGETLB_PAGE_SIZE_VARIABLE
>> when HUGETLB_PAGE is not available irrespective of the select statement on
>> the platforms ?
> 
> depends doesn't properly work for variables that are selected.
> 

Alright, will move the HUGETLB_PAGE dependency to platforms while selecting
the variable HUGETLB_PAGE_SIZE_VARIABLE.


Re: [PATCH] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-02-28 Thread Anshuman Khandual



On 3/1/21 11:53 AM, Christoph Hellwig wrote:
> On Mon, Mar 01, 2021 at 11:20:53AM +0530, Anshuman Khandual wrote:
>> HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
>> platform subscribing it. Instead just make it generic.
>>
>> Cc: Michael Ellerman 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Andrew Morton 
>> Cc: Christoph Hellwig 
>> Cc: linux-i...@vger.kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux...@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Suggested-by: Christoph Hellwig 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> This change was originally suggested in an earilier discussion. This
>> applies on v5.12-rc1 and has been build tested on all applicable
>> platforms i.e ia64 and powerpc.
>>
>> https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/
>>
>>  arch/ia64/Kconfig| 6 +-
>>  arch/powerpc/Kconfig | 6 +-
>>  mm/Kconfig   | 8 
>>  3 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
>> index 2ad7a8d29fcc..6b3e3f6c29ae 100644
>> --- a/arch/ia64/Kconfig
>> +++ b/arch/ia64/Kconfig
>> @@ -32,6 +32,7 @@ config IA64
>>  select TTY
>>  select HAVE_ARCH_TRACEHOOK
>>  select HAVE_VIRT_CPU_ACCOUNTING
>> +select HUGETLB_PAGE_SIZE_VARIABLE
> 
> doesn't this need a 'if HUGETLB_PAGE'

While making HUGETLB_PAGE_SIZE_VARIABLE a generic option, also made it
dependent on HUGETLB_PAGE. Should not that gate HUGETLB_PAGE_SIZE_VARIABLE
when HUGETLB_PAGE is not available irrespective of the select statement on
the platforms ?

> 
> or did you verify that HUGETLB_PAGE_SIZE_VARIABLE checks are always
> nested inside of HUGETLB_PAGE ones?
>


[PATCH] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-02-28 Thread Anshuman Khandual
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
platform subscribing it. Instead just make it generic.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Christoph Hellwig 
Signed-off-by: Anshuman Khandual 
---
This change was originally suggested in an earilier discussion. This
applies on v5.12-rc1 and has been build tested on all applicable
platforms i.e ia64 and powerpc.

https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/

 arch/ia64/Kconfig| 6 +-
 arch/powerpc/Kconfig | 6 +-
 mm/Kconfig   | 8 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..6b3e3f6c29ae 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -32,6 +32,7 @@ config IA64
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
+   select HUGETLB_PAGE_SIZE_VARIABLE
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -82,11 +83,6 @@ config STACKTRACE_SUPPORT
 config GENERIC_LOCKBREAK
def_bool n
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE
-   default y
-
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3778ad17f56a..b8565bed284f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -232,6 +232,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_PAGE_SIZE
select HAVE_REGS_AND_STACK_ACCESS_API
@@ -416,11 +417,6 @@ config HIGHMEM
 
 source "kernel/Kconfig.hz"
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE && PPC_BOOK3S_64
-   default y
-
 config MATH_EMULATION
bool "Math emulation"
depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..e604a87862a4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -274,6 +274,14 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config HUGETLB_PAGE_SIZE_VARIABLE
+   def_bool n
+   depends on HUGETLB_PAGE
+   help
+ When there are multiple HugeTLB sizes available on a platform
+ and pageblock_order could then be a dynamic value instead of
+ standard HUGETLB_PAGE_ORDER.
+
 config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-- 
2.20.1



[PATCH] arm64/mm: Drop redundant ARCH_WANT_HUGE_PMD_SHARE

2021-02-28 Thread Anshuman Khandual
There is already an ARCH_WANT_HUGE_PMD_SHARE which is being selected for
applicable configurations. Hence just drop the other redundant entry.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
This applies on v5.12-rc1

 arch/arm64/Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d4fe5118e9c8..9cd33c7be429 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1056,8 +1056,6 @@ config HW_PERF_EVENTS
 config SYS_SUPPORTS_HUGETLBFS
def_bool y
 
-config ARCH_WANT_HUGE_PMD_SHARE
-
 config ARCH_HAS_CACHE_LINE_SIZE
def_bool y
 
-- 
2.20.1



[PATCH V2] mm/memtest: Add ARCH_USE_MEMTEST

2021-02-28 Thread Anshuman Khandual
early_memtest() does not get called from all architectures. Hence enabling
CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel command line
option might not trigger the memory pattern tests as would be expected in
normal circumstances. This situation is misleading.

The change here prevents the above mentioned problem after introducing a
new config option ARCH_USE_MEMTEST that should be subscribed on platforms
that call early_memtest(), in order to enable the config CONFIG_MEMTEST.
Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would
not be tested anyway.

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Thomas Bogendoerfer 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Max Filippov 
Signed-off-by: Anshuman Khandual 
---
This patch applies on v5.12-rc1 and has been tested on arm64 platform.
But it has been just build tested on all other platforms.

Changes in V2:

- Added ARCH_USE_MEMTEST in the sorted alphabetical order on platforms

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1612498242-31579-1-git-send-email-anshuman.khand...@arm.com/

 arch/arm/Kconfig | 1 +
 arch/arm64/Kconfig   | 1 +
 arch/mips/Kconfig| 1 +
 arch/powerpc/Kconfig | 1 +
 arch/x86/Kconfig | 1 +
 arch/xtensa/Kconfig  | 1 +
 lib/Kconfig.debug| 9 -
 7 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 853aab5ab327..9ab047d4cd0a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -33,6 +33,7 @@ config ARM
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
+   select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1f212b47a48a..d4fe5118e9c8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -67,6 +67,7 @@ config ARM64
select ARCH_KEEP_MEMBLOCK
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_GNU_PROPERTY
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d89efba3d8a4..93a4f502f962 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -14,6 +14,7 @@ config MIPS
select ARCH_SUPPORTS_UPROBES
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..3778ad17f56a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,6 +149,7 @@ config PPC
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC32 || PPC_BOOK3S_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
select ARCH_WANT_IPC_PARSE_VERSION
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..2cb76fd5258e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,6 +100,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG  if X86_64
select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
select ARCH_USE_BUILTIN_BSWAP
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index a99dc39f6964..ca51896c53df 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -7,6 +7,7 @@ config XTENSA
select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU
select ARCH_HAS_DMA_SET_UNCACHED if MMU
+   select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_FRAME_POINTERS
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a2d04c00cda2..2c296535a4b3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2521,11 +2521,18 @@ config TEST_FPU
 
 endif # RUNTIME_TESTING_MENU
 
+config ARCH_USE_MEMTEST
+   bool
+   help
+ An architecture should select this when it uses early_memtest()
+ during boot process.
+
 config MEMTEST
bool "Memtest"
+

Re: [PATCH v3 1/1] arm64: mm: correct the inside linear map range during hotplug check

2021-02-23 Thread Anshuman Khandual



On 2/16/21 8:33 PM, Pavel Tatashin wrote:
> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
> linear map range is not checked correctly.
> 
> The start physical address that linear map covers can be actually at the
> end of the range because of randomization. Check that and if so reduce it
> to 0.
> 
> This can be verified on QEMU with setting kaslr-seed to ~0ul:
> 
> memstart_offset_seed = 0x
> START: __pa(_PAGE_OFFSET(vabits_actual)) = 9000c000
> END:   __pa(PAGE_END - 1) =  1000bfff

This would have tripped the check in mhp_get_pluggable_range()
with errors something like here, which is expected.

Hotplug memory [0x68000-0x68800] exceeds maximum addressable range 
[0x0-0x0]
Hotplug memory [0x6c000-0x6c800] exceeds maximum addressable range 
[0x0-0x0]
Hotplug memory [0x7-0x70800] exceeds maximum addressable range 
[0x0-0x0]
Hotplug memory [0x78000-0x78800] exceeds maximum addressable range 
[0x0-0x0]
Hotplug memory [0x7c000-0x7c800] exceeds maximum addressable range 
[0x0-0x0]

> 
> Signed-off-by: Pavel Tatashin 
> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear 
> mapping")
> Tested-by: Tyler Hicks 
> ---
>  arch/arm64/mm/mmu.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ef7698c4e2f0..0d9c115e427f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1447,6 +1447,22 @@ static void __remove_pgd_mapping(pgd_t *pgdir, 
> unsigned long start, u64 size)
>  struct range arch_get_mappable_range(void)
>  {
>   struct range mhp_range;
> + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
> + u64 end_linear_pa = __pa(PAGE_END - 1);
> +
> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
> + /*
> +  * Check for a wrap, it is possible because of randomized linear
> +  * mapping the start physical address is actually bigger than
> +  * the end physical address. In this case set start to zero
> +  * because [0, end_linear_pa] range must still be able to cover
> +  * all addressable physical addresses.
> +  */
> + if (start_linear_pa > end_linear_pa)
> + start_linear_pa = 0;
> + }
> +
> + WARN_ON(start_linear_pa > end_linear_pa);
>  
>   /*
>* Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
> @@ -1454,8 +1470,9 @@ struct range arch_get_mappable_range(void)
>* range which can be mapped inside this linear mapping range, must
>* also be derived from its end points.
>*/
> - mhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
> - mhp_range.end =  __pa(PAGE_END - 1);
> + mhp_range.start = start_linear_pa;
> + mhp_range.end =  end_linear_pa;
> +
>   return mhp_range;
>  }

LGTM.

Reviewed-by: Anshuman Khandual 


Re: [PATCH] Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64

2021-02-22 Thread Anshuman Khandual



On 2/23/21 6:02 AM, Barry Song wrote:
> BATCHED_UNMAP_TLB_FLUSH is used on x86 to do batched tlb shootdown by
> sending one IPI to TLB flush all entries after unmapping pages rather
> than sending an IPI to flush each individual entry.
> On arm64, tlb shootdown is done by hardware. Flush instructions are
> innershareable. The local flushes are limited to the boot (1 per CPU)
> and when a task is getting a new ASID.

Is there any previous discussion around this ?

> So marking this feature as "TODO" is not proper. ".." isn't good as
> well. So this patch adds a "N/A" for this kind of features which are
> not needed on some architectures.
> 
> Cc: Mel Gorman 
> Cc: Andy Lutomirski 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Signed-off-by: Barry Song 
> ---
>  Documentation/features/arch-support.txt| 1 +
>  Documentation/features/vm/TLB/arch-support.txt | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/features/arch-support.txt 
> b/Documentation/features/arch-support.txt
> index d22a1095e661..118ae031840b 100644
> --- a/Documentation/features/arch-support.txt
> +++ b/Documentation/features/arch-support.txt
> @@ -8,4 +8,5 @@ The meaning of entries in the tables is:
>  | ok |  # feature supported by the architecture
>  |TODO|  # feature not yet supported by the architecture
>  | .. |  # feature cannot be supported by the hardware
> +| N/A|  # feature doesn't apply to the architecture

NA might be better here. s/doesn't apply/not applicable/ in order to match NA.
Still wondering if NA is really needed when there is already ".." ? Regardless
either way should be fine.

>  
> diff --git a/Documentation/features/vm/TLB/arch-support.txt 
> b/Documentation/features/vm/TLB/arch-support.txt
> index 30f75a79ce01..0d070f9f98d8 100644
> --- a/Documentation/features/vm/TLB/arch-support.txt
> +++ b/Documentation/features/vm/TLB/arch-support.txt
> @@ -9,7 +9,7 @@
>  |   alpha: | TODO |
>  | arc: | TODO |
>  | arm: | TODO |
> -|   arm64: | TODO |
> +|   arm64: | N/A  |
>  | c6x: |  ..  |
>  |csky: | TODO |
>  |   h8300: |  ..  |
> 


Re: [PATCH V3 00/14] arm64: coresight: Enable ETE and TRBE

2021-02-17 Thread Anshuman Khandual



On 2/2/21 12:14 AM, Mathieu Poirier wrote:
> On Wed, Jan 27, 2021 at 02:25:24PM +0530, Anshuman Khandual wrote:
>> This series enables future IP trace features Embedded Trace Extension (ETE)
>> and Trace Buffer Extension (TRBE). This series depends on the ETM system
>> register instruction support series [0] which is available here [1]. This
>> series which applies on [1] is avaialble here [2] for quick access.
>>
>> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
>> extensions. ETE overlaps with the ETMv4 architecture, with additions to
>> support the newer architecture features and some restrictions on the
>> supported features w.r.t ETMv4. The ETE support is added by extending the
>> ETMv4 driver to recognise the ETE and handle the features as exposed by the
>> TRCIDRx registers. ETE only supports system instructions access from the
>> host CPU. The ETE could be integrated with a TRBE (see below), or with the
>> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
>> description as the ETMs and requires a node per instance. 
>>
>> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
>> accessible via the system registers and can be combined with the ETE to
>> provide a 1x1 configuration of source & sink. TRBE is being represented
>> here as a CoreSight sink. Primary reason is that the ETE source could work
>> with other traditional CoreSight sink devices. As TRBE captures the trace
>> data which is produced by ETE, it cannot work alone.
>>
>> TRBE representation here have some distinct deviations from a traditional
>> CoreSight sink device. Coresight path between ETE and TRBE are not built
>> during boot looking at respective DT or ACPI entries.
>>
>> Unlike traditional sinks, TRBE can generate interrupts to signal including
>> many other things, buffer got filled. The interrupt is a PPI and should be
>> communicated from the platform. DT or ACPI entry representing TRBE should
>> have the PPI number for a given platform. During perf session, the TRBE IRQ
>> handler should capture trace for perf auxiliary buffer before restarting it
>> back. System registers being used here to configure ETE and TRBE could be
>> referred in the link below.
>>
>> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> This set is giving me several checkpatch.pl warnings...  Those about complex
> macros and DT bindings are fine but everything else should have been addressed
> by now.  Since this is your first patchset I will carry on but I expect future
> submissions to be clean. 
> 

Hello Mathieu,

All the potential patches for upcoming V4 series applies cleanly but these are
some checkpatch.pl errors or warnings which could not be resolved.

- Anshuman

1. 0004-coresight-ete-Add-support-for-ETE-sysreg-access.patch
=

ERROR: Macros with complex values should be enclosed in parentheses
#88: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:165:
+#define ETE_ONLY_SYSREG_LIST(op, val)  \
+   CASE_##op((val), TRCRSR)\
+   CASE_##op((val), TRCEXTINSELRn(1))  \
+   CASE_##op((val), TRCEXTINSELRn(2))  \
+   CASE_##op((val), TRCEXTINSELRn(3))

ERROR: Macros with complex values should be enclosed in parentheses
#97: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:172:
+#define ETM4x_ONLY_SYSREG_LIST(op, val)\
CASE_##op((val), TRCPROCSELR)   \
+   CASE_##op((val), TRCVDCTLR) \
+   CASE_##op((val), TRCVDSACCTLR)  \
+   CASE_##op((val), TRCVDARCCTLR)  \
+   CASE_##op((val), TRCOSLAR)

ERROR: Macros with complex values should be enclosed in parentheses
#104: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:179:
+#define ETM_COMMON_SYSREG_LIST(op, val)\
+   CASE_##op((val), TRCPRGCTLR)\
CASE_##op((val), TRCSTATR)  \
CASE_##op((val), TRCCONFIGR)\
CASE_##op((val), TRCAUXCTLR)\

ERROR: Macros with complex values should be enclosed in parentheses
#133: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:382:
+#define ETM4x_READ_SYSREG_CASES(res)   \
+   ETM_COMMON_SYSREG_LIST(READ, (res)) \
+   ETM4x_ONLY_SYSREG_LIST(READ, (res))

ERROR: Macros with complex values should be enclosed in parentheses
#137: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:386:
+#define ETM4x_WRITE_SYSREG_CASES(val)  \
+   ETM_COMMON_SYSREG_LIST(WRITE, (val))\
+   ETM4x_ONLY_SYSREG_LIST(WRITE, (val))

ERROR: Macros with complex values should be enclosed in parentheses
#147: FILE: drivers/hwtracing/coresight/coresight-etm4x.h:

Re: [PATCH V3 14/14] coresight: etm-perf: Add support for trace buffer format

2021-02-17 Thread Anshuman Khandual



On 1/27/21 6:30 PM, Al Grant wrote:
>>> +/* CoreSight PMU AUX buffer formats */
>>> +#define PERF_AUX_FLAG_CORESIGHT_FORMAT_CORESIGHT0x /*
>> Default for backward compatibility */
>>> +#define PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW0x0100 /*
>> Raw format of the source */
>>
>> Would CORESIGHT_FORMAT_ETR / CORESIGHT_FORMAT_TRBE be better
>> names?
> 
> Unformatted (raw) streams could be used any time you had a writer dedicated
> to a single trace source. So in a situation where you had one ETR per CPU,
> it would be appropriate to use an unformatted stream. A TRBE is always
> dedicated to a single CPU, but potentially you (i.e. when designing the 
> system)
> can do this with any type of trace sink. So the raw/formatted distinction is
> really about whether you are combining multiple streams in one buffer or not,
> rather than the type of block that is writing into the buffer.
> 
> Al
> 

Okay, will stick with the proposed format names here

i.e 

PERF_AUX_FLAG_CORESIGHT_FORMAT_CORESIGHT
PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW


Re: [PATCH V3 08/14] coresight: core: Add support for dedicated percpu sinks

2021-02-16 Thread Anshuman Khandual



On 2/5/21 12:04 AM, Mathieu Poirier wrote:
> On Thu, Jan 28, 2021 at 09:16:34AM +, Suzuki K Poulose wrote:
>> On 1/27/21 8:55 AM, Anshuman Khandual wrote:
>>> Add support for dedicated sinks that are bound to individual CPUs. (e.g,
>>> TRBE). To allow quicker access to the sink for a given CPU bound source,
>>> keep a percpu array of the sink devices. Also, add support for building
>>> a path to the CPU local sink from the ETM.
>>>
>>> This adds a new percpu sink type CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM.
>>> This new sink type is exclusively available and can only work with percpu
>>> source type device CORESIGHT_DEV_SUBTYPE_SOURCE_PERCPU_PROC.
>>>
>>> This defines a percpu structure that accommodates a single coresight_device
>>> which can be used to store an initialized instance from a sink driver. As
>>> these sinks are exclusively linked and dependent on corresponding percpu
>>> sources devices, they should also be the default sink device during a perf
>>> session.
>>>
>>> Outwards device connections are scanned while establishing paths between a
>>> source and a sink device. But such connections are not present for certain
>>> percpu source and sink devices which are exclusively linked and dependent.
>>> Build the path directly and skip connection scanning for such devices.
>>>
>>> Cc: Mathieu Poirier 
>>> Cc: Mike Leach 
>>> Cc: Suzuki K Poulose 
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> Changes in V3:
>>>
>>> - Updated coresight_find_default_sink()
>>>
>>>   drivers/hwtracing/coresight/coresight-core.c | 16 ++--
>>>   include/linux/coresight.h| 12 
>>>   2 files changed, 26 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
>>> b/drivers/hwtracing/coresight/coresight-core.c
>>> index 0062c89..4795e28 100644
>>> --- a/drivers/hwtracing/coresight/coresight-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>>> @@ -23,6 +23,7 @@
>>>   #include "coresight-priv.h"
>>>   static DEFINE_MUTEX(coresight_mutex);
>>> +DEFINE_PER_CPU(struct coresight_device *, csdev_sink);
>>>   /**
>>>* struct coresight_node - elements of a path, from source to sink
>>> @@ -784,6 +785,13 @@ static int _coresight_build_path(struct 
>>> coresight_device *csdev,
>>> if (csdev == sink)
>>> goto out;
>>> +   if (coresight_is_percpu_source(csdev) && coresight_is_percpu_sink(sink) 
>>> &&
>>> +   sink == per_cpu(csdev_sink, source_ops(csdev)->cpu_id(csdev))) {
>>> +   _coresight_build_path(sink, sink, path);
> 
> The return value for _coresight_build_path() needs to be checked.  Otherwise a
> failure to allocate a node for the sink will go unoticed and make for a very
> hard problem to debug.

How about this instead ?

diff --git a/drivers/hwtracing/coresight/coresight-core.c 
b/drivers/hwtracing/coresight/coresight-core.c
index 4795e28..e93e669 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -787,9 +787,10 @@ static int _coresight_build_path(struct coresight_device 
*csdev,
 
if (coresight_is_percpu_source(csdev) && coresight_is_percpu_sink(sink) 
&&
sink == per_cpu(csdev_sink, source_ops(csdev)->cpu_id(csdev))) {
-   _coresight_build_path(sink, sink, path);
-   found = true;
-   goto out;
+   if (_coresight_build_path(sink, sink, path) == 0) {
+   found = true;
+   goto out;
+   }
}
 
/* Not a sink - recursively explore each port found on this element */

> 
>>> +   found = true;
>>> +   goto out;
>>> +   }
>>> +
>>> /* Not a sink - recursively explore each port found on this element */
>>> for (i = 0; i < csdev->pdata->nr_outport; i++) {
>>> struct coresight_device *child_dev;
>>> @@ -999,8 +1007,12 @@ coresight_find_default_sink(struct coresight_device 
>>> *csdev)
>>> int depth = 0;
>>> /* look for a default sink if we have not found for this device */
>>> -   if (!csdev->def_sink)
>>> -   csdev->def_sink = coresight_find_sink(csdev, );
>>> +   if (!csdev->def_sink) {
>>> +   if (coresight_is_percpu_source(csdev))
>>> +  

Re: [PATCH V3 08/14] coresight: core: Add support for dedicated percpu sinks

2021-02-16 Thread Anshuman Khandual



On 1/28/21 2:46 PM, Suzuki K Poulose wrote:
> On 1/27/21 8:55 AM, Anshuman Khandual wrote:
>> Add support for dedicated sinks that are bound to individual CPUs. (e.g,
>> TRBE). To allow quicker access to the sink for a given CPU bound source,
>> keep a percpu array of the sink devices. Also, add support for building
>> a path to the CPU local sink from the ETM.
>>
>> This adds a new percpu sink type CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM.
>> This new sink type is exclusively available and can only work with percpu
>> source type device CORESIGHT_DEV_SUBTYPE_SOURCE_PERCPU_PROC.
>>
>> This defines a percpu structure that accommodates a single coresight_device
>> which can be used to store an initialized instance from a sink driver. As
>> these sinks are exclusively linked and dependent on corresponding percpu
>> sources devices, they should also be the default sink device during a perf
>> session.
>>
>> Outwards device connections are scanned while establishing paths between a
>> source and a sink device. But such connections are not present for certain
>> percpu source and sink devices which are exclusively linked and dependent.
>> Build the path directly and skip connection scanning for such devices.
>>
>> Cc: Mathieu Poirier 
>> Cc: Mike Leach 
>> Cc: Suzuki K Poulose 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Changes in V3:
>>
>> - Updated coresight_find_default_sink()
>>
>>   drivers/hwtracing/coresight/coresight-core.c | 16 ++--
>>   include/linux/coresight.h    | 12 
>>   2 files changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
>> b/drivers/hwtracing/coresight/coresight-core.c
>> index 0062c89..4795e28 100644
>> --- a/drivers/hwtracing/coresight/coresight-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>> @@ -23,6 +23,7 @@
>>   #include "coresight-priv.h"
>>     static DEFINE_MUTEX(coresight_mutex);
>> +DEFINE_PER_CPU(struct coresight_device *, csdev_sink);
>>     /**
>>    * struct coresight_node - elements of a path, from source to sink
>> @@ -784,6 +785,13 @@ static int _coresight_build_path(struct 
>> coresight_device *csdev,
>>   if (csdev == sink)
>>   goto out;
>>   +    if (coresight_is_percpu_source(csdev) && 
>> coresight_is_percpu_sink(sink) &&
>> +    sink == per_cpu(csdev_sink, source_ops(csdev)->cpu_id(csdev))) {
>> +    _coresight_build_path(sink, sink, path);
>> +    found = true;
>> +    goto out;
>> +    }
>> +
>>   /* Not a sink - recursively explore each port found on this element */
>>   for (i = 0; i < csdev->pdata->nr_outport; i++) {
>>   struct coresight_device *child_dev;
>> @@ -999,8 +1007,12 @@ coresight_find_default_sink(struct coresight_device 
>> *csdev)
>>   int depth = 0;
>>     /* look for a default sink if we have not found for this device */
>> -    if (!csdev->def_sink)
>> -    csdev->def_sink = coresight_find_sink(csdev, );
>> +    if (!csdev->def_sink) {
>> +    if (coresight_is_percpu_source(csdev))
>> +    csdev->def_sink = per_cpu(csdev_sink, 
>> source_ops(csdev)->cpu_id(csdev));
>> +    if (!csdev->def_sink)
>> +    csdev->def_sink = coresight_find_sink(csdev, );
>> +    }
>>   return csdev->def_sink;
>>   }
>>   diff --git a/include/linux/coresight.h b/include/linux/coresight.h
>> index 976ec26..bc3a5ca 100644
>> --- a/include/linux/coresight.h
>> +++ b/include/linux/coresight.h
>> @@ -50,6 +50,7 @@ enum coresight_dev_subtype_sink {
>>   CORESIGHT_DEV_SUBTYPE_SINK_PORT,
>>   CORESIGHT_DEV_SUBTYPE_SINK_BUFFER,
>>   CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM,
>> +    CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM,
>>   };
>>     enum coresight_dev_subtype_link {
>> @@ -428,6 +429,17 @@ static inline void csdev_access_write64(struct 
>> csdev_access *csa, u64 val, u32 o
>>   csa->write(val, offset, false, true);
>>   }
>>   +static inline bool coresight_is_percpu_source(struct coresight_device 
>> *csdev)
>> +{
>> +    return csdev && (csdev->type == CORESIGHT_DEV_TYPE_SOURCE) &&
>> +   csdev->subtype.source_subtype == 
>> CORESIGHT_DEV_SUBTYPE_SOURCE_PROC;
> 
> Please add () around the last line. Same below.

Okay, will do.

> 
>> +}
>> +
>> +static inline bool coresight_is_percpu_sink(struct coresight_device *csdev)
>> +{
>> +    return csdev && (csdev->type == CORESIGHT_DEV_TYPE_SINK) &&
>> +   csdev->subtype.sink_subtype == 
>> CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM;

Okay, will add here as well.

>> +}
>>   #else    /* !CONFIG_64BIT */
>>     static inline u64 csdev_access_relaxed_read64(struct csdev_access *csa,
>>
> 
> With the above :
> 
> Tested-by: Suzuki K Poulose 
> Reviewed-by: Suzuki K Poulose 


Re: [PATCH V3 11/14] coresight: sink: Add TRBE driver

2021-02-16 Thread Anshuman Khandual
Hello Mike,

On 2/16/21 2:30 PM, Mike Leach wrote:
> Hi Anshuman,
> 
> There have been plenty of detailed comments so I will restrict mine to
> a few general issues:-
> 
> 1) Currently there appears to be no sysfs support (I cannot see the
> MODE_SYSFS constants running alongside the MODE_PERF ones present in
> the other sink drivers). This is present on all other coresight
> devices, and must be provided for this device. It is useful for
> testing, and there are users out there who will have scripts to use
> it. It is not essential it makes it into this set, but should be a
> follow up set.

Sure, will try and add it in a follow up series.

> 
> 2) Using FILL mode for TRBE means that the trace will by definition be
> lossy. Fill mode will halt collection without cleanly stopping and
> flushing the source. This will result in the sink missing the last of
> the data from the source as it stops. Even if taking the exception
> moves into a prohibited region there is still the possibility the last
> trace operations will not be seen. Further it is possible that the
> last few bytes of trace will be an incomplete packet, and indeed the
> start of the next buffer could contain incomplete packets too.

Just wondering why TRBE and ETE would not sync with each other in order
for the ETE to possibly resend all the lost trace data, when the TRBE
runs out of buffer and wrappers around ? Is this ETE/TRBE behavior same
for all implementations in the FILL mode ? Just wondering.

> 
> This operation differs from the other sinks which will only halt after
> the sources have stopped and the path has been flushed. This ensures
> that the latest trace is complete. The weakness with the older sinks
> is the lack of interrupt meaning buffers were frequently wrapped so
> that only the latest trace is available.

Right.

> 
> By using TRBE WRAP mode, with a watermark as described in the TRBE
> spec, using the interrupts it is possible to approach lossless trace
> in a way that is not possible with earlier ETR/ETB. This is somethin
Using TRBTRG_EL1 as the above mentioned watermark ?

> that has been requested by partners since trace became available in
> linux systems. (There is still a possibility of loss due to filling
> the buffer completely and overflowing the watermark, but that can be
> flagged).
> 
> While FILL mode trace is a good start, and suitable for some scenarios
> - WRAP mode needs implementing as well.

I would like to understand this mechanism more. Besides how the perf
interface suppose to choose between FILL and WRAP mode ? via a new
event attribute ?

> 
> 3) Padding: To be clear, it is not safe for the decoder to run off the
> end of one buffer, into the padding area and continue decoding, or
> continue through the padding into the next buffer. However I believe
> the buffer start / stop points are demarked by the aux_output_start /
> aux_output_end calls?

Yes.

> 
> With upcoming perf decode updates this should enable the decoder to
> correctly be started and stopped on the buffer boundaries. The padding
> is there primarily to ensure that the decoder does not synchronize
> with the data stream until a genuine sync point is found.

Right.

> 
> 4) TRBE needs to be a loadable module like the rest of coresight.

Even though the driver has all the module constructs, the Kconfig was
missing a tristate value, which is being fixed for the next version.

- Anshuman


Re: [PATCH 0/3] mm/page_alloc: Fix pageblock_order with HUGETLB_PAGE_SIZE_VARIABLE

2021-02-16 Thread Anshuman Khandual


On 2/12/21 3:09 PM, David Hildenbrand wrote:
> On 12.02.21 08:02, Anshuman Khandual wrote:
>>
>> On 2/11/21 2:07 PM, David Hildenbrand wrote:
>>> On 11.02.21 07:22, Anshuman Khandual wrote:
>>>> The following warning gets triggered while trying to boot a 64K page size
>>>> without THP config kernel on arm64 platform.
>>>>
>>>> WARNING: CPU: 5 PID: 124 at mm/vmstat.c:1080 
>>>> __fragmentation_index+0xa4/0xc0
>>>> Modules linked in:
>>>> CPU: 5 PID: 124 Comm: kswapd0 Not tainted 5.11.0-rc6-4-ga0ea7d62002 
>>>> #159
>>>> Hardware name: linux,dummy-virt (DT)
>>>> [    8.810673] pstate: 2045 (nzCv daif +PAN -UAO -TCO BTYPE=--)
>>>> [    8.811732] pc : __fragmentation_index+0xa4/0xc0
>>>> [    8.812555] lr : fragmentation_index+0xf8/0x138
>>>> [    8.813360] sp : 864079b0
>>>> [    8.813958] x29: 864079b0 x28: 0372
>>>> [    8.814901] x27: 7682 x26: 8000135b3948
>>>> [    8.815847] x25: 1fffe00010c80f48 x24: 
>>>> [    8.816805] x23:  x22: 000d
>>>> [    8.817764] x21: 0030 x20: 0005ffcb4d58
>>>> [    8.818712] x19: 000b x18: 
>>>> [    8.819656] x17:  x16: 
>>>> [    8.820613] x15:  x14: 8000114c6258
>>>> [    8.821560] x13: 6000bff969ba x12: 1fffe000bff969b9
>>>> [    8.822514] x11: 1fffe000bff969b9 x10: 6000bff969b9
>>>> [    8.823461] x9 : dfff8000 x8 : 0005ffcb4dcf
>>>> [    8.824415] x7 : 0001 x6 : 41b58ab3
>>>> [    8.825359] x5 : 600010c80f48 x4 : dfff8000
>>>> [    8.826313] x3 : 8000102be670 x2 : 0007
>>>> [    8.827259] x1 : 86407a60 x0 : 000d
>>>> [    8.828218] Call trace:
>>>> [    8.828667]  __fragmentation_index+0xa4/0xc0
>>>> [    8.829436]  fragmentation_index+0xf8/0x138
>>>> [    8.830194]  compaction_suitable+0x98/0xb8
>>>> [    8.830934]  wakeup_kcompactd+0xdc/0x128
>>>> [    8.831640]  balance_pgdat+0x71c/0x7a0
>>>> [    8.832327]  kswapd+0x31c/0x520
>>>> [    8.832902]  kthread+0x224/0x230
>>>> [    8.833491]  ret_from_fork+0x10/0x30
>>>> [    8.834150] ---[ end trace 472836f79c15516b ]---
>>>>
>>>> This warning comes from __fragmentation_index() when the requested order
>>>> is greater than MAX_ORDER.
>>>>
>>>> static int __fragmentation_index(unsigned int order,
>>>>   struct contig_page_info *info)
>>>> {
>>>>   unsigned long requested = 1UL << order;
>>>>
>>>>   if (WARN_ON_ONCE(order >= MAX_ORDER)) <= Triggered here
>>>>   return 0;
>>>>
>>>> Digging it further reveals that pageblock_order has been assigned a value
>>>> which is greater than MAX_ORDER failing the above check. But why this
>>>> happened ? Because HUGETLB_PAGE_ORDER for the given config on arm64 is
>>>> greater than MAX_ORDER.
>>>>
>>>> The solution involves enabling HUGETLB_PAGE_SIZE_VARIABLE which would make
>>>> pageblock_order a variable instead of constant HUGETLB_PAGE_ORDER. But that
>>>> change alone also did not really work as pageblock_order still got assigned
>>>> as HUGETLB_PAGE_ORDER in set_pageblock_order(). HUGETLB_PAGE_ORDER needs to
>>>> be less than MAX_ORDER for its appropriateness as pageblock_order otherwise
>>>> just fallback to MAX_ORDER - 1 as before. While here it also fixes a build
>>>> problem via type casting MAX_ORDER in rmem_cma_setup().
>>>
>>> I'm wondering, is there any real value in allowing FORCE_MAX_ZONEORDER to 
>>> be "11" with ARM64_64K_PAGES/ARM64_16K_PAGES?
>>
>> MAX_ORDER should be as high as would be required for the current config.
>> Unless THP is enabled, there is no need for it to be any higher than 11.
>> But I might be missing historical reasons around this as well. Probably
>> others from arm64 could help here.
> 
> Theoretically yes, practically no. If nobody cares about a configuration, no 
> need to make the code more complicated for that configuration.
> 
>>
>>>
>>> Meaning: are there any real use cases that actually build a kernel without 
>>> TRA

Re: [PATCH v2 1/1] arm64: mm: correct the inside linear map boundaries during hotplug check

2021-02-15 Thread Anshuman Khandual



On 2/16/21 1:21 AM, Pavel Tatashin wrote:
> On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel  wrote:
>>
>> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin  
>> wrote:
>>>
 Can't we simply use signed arithmetic here? This expression works fine
 if the quantities are all interpreted as s64 instead of u64
>>>
>>> I was thinking about that, but I do not like the idea of using sign
>>> arithmetics for physical addresses. Also, I am worried that someone in
>>> the future will unknowingly change it to unsigns or to phys_addr_t. It
>>> is safer to have start explicitly set to 0 in case of wrap.
>>
>> memstart_addr is already a s64 for this exact reason.
> 
> memstart_addr is basically an offset and it must be negative. For
> example, this would not work if it was not signed:
> #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> 
> However, on powerpc it is phys_addr_t type.
> 
>>
>> Btw, the KASLR check is incorrect: memstart_addr could also be
>> negative when running the 52-bit VA kernel on hardware that is only
>> 48-bit VA capable.
> 
> Good point!
> 
> if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
> memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
> 
> So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.
> 
> I am OK to change start_linear_pa, end_linear_pa to signed, but IMO
> what I have now is actually safer to make sure that does not break
> again in the future.
An explicit check for the flip over and providing two different start
addresses points would be required in order to use the new framework.


Re: [PATCH v2 1/1] arm64: mm: correct the inside linear map boundaries during hotplug check

2021-02-15 Thread Anshuman Khandual



On 2/16/21 12:57 AM, Ard Biesheuvel wrote:
> On Mon, 15 Feb 2021 at 20:22, Pavel Tatashin  
> wrote:
>>
>> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
>> linear map range is not checked correctly.
>>
>> The start physical address that linear map covers can be actually at the
>> end of the range because of randomization. Check that and if so reduce it
>> to 0.
>>
>> This can be verified on QEMU with setting kaslr-seed to ~0ul:
>>
>> memstart_offset_seed = 0x
>> START: __pa(_PAGE_OFFSET(vabits_actual)) = 9000c000
>> END:   __pa(PAGE_END - 1) =  1000bfff
>>
>> Signed-off-by: Pavel Tatashin 
>> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating 
>> linear mapping")
>> Tested-by: Tyler Hicks 
> 
>> ---
>>  arch/arm64/mm/mmu.c | 20 ++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index ae0c3d023824..cc16443ea67f 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, 
>> unsigned long start, u64 size)
>>
>>  static bool inside_linear_region(u64 start, u64 size)
>>  {
>> +   u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
>> +   u64 end_linear_pa = __pa(PAGE_END - 1);
>> +
>> +   if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
>> +   /*
>> +* Check for a wrap, it is possible because of randomized 
>> linear
>> +* mapping the start physical address is actually bigger than
>> +* the end physical address. In this case set start to zero
>> +* because [0, end_linear_pa] range must still be able to 
>> cover
>> +* all addressable physical addresses.
>> +*/
>> +   if (start_linear_pa > end_linear_pa)
>> +   start_linear_pa = 0;
>> +   }
>> +
>> +   WARN_ON(start_linear_pa > end_linear_pa);
>> +
>> /*
>>  * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
>>  * accommodating both its ends but excluding PAGE_END. Max physical
>>  * range which can be mapped inside this linear mapping range, must
>>  * also be derived from its end points.
>>  */
>> -   return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
>> -  (start + size - 1) <= __pa(PAGE_END - 1);
> 
> Can't we simply use signed arithmetic here? This expression works fine
> if the quantities are all interpreted as s64 instead of u64

There is a new generic framework which expects the platform to provide two
distinct range points (low and high) for hotplug address comparison. Those
range points can be different depending on whether address randomization
is enabled and the flip occurs. But this comparison here in the platform
code is going away.

This patch needs to rebased on the new framework which is part of linux-next.

https://patchwork.kernel.org/project/linux-mm/list/?series=425051


Re: [PATCH V3 11/14] coresight: sink: Add TRBE driver

2021-02-15 Thread Anshuman Khandual


On 2/13/21 1:56 AM, Mathieu Poirier wrote:
> On Wed, Jan 27, 2021 at 02:25:35PM +0530, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Cc: Mathieu Poirier 
>> Cc: Mike Leach 
>> Cc: Suzuki K Poulose 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Changes in V3:
>>
>> - Added new DT bindings document TRBE.yaml
>> - Changed TRBLIMITR_TRIG_MODE_SHIFT from 2 to 3
>> - Dropped isb() from trbe_reset_local()
>> - Dropped gap between (void *) and buf->trbe_base
>> - Changed 'int' to 'unsigned int' in is_trbe_available()
>> - Dropped unused function set_trbe_running(), set_trbe_virtual_mode(),
>>   set_trbe_enabled() and set_trbe_limit_pointer()
>> - Changed get_trbe_flag_update(), is_trbe_programmable() and
>>   get_trbe_address_align() to accept TRBIDR value
>> - Changed is_trbe_running(), is_trbe_abort(), is_trbe_wrap(), is_trbe_trg(),
>>   is_trbe_irq(), get_trbe_bsc() and get_trbe_ec() to accept TRBSR value
>> - Dropped snapshot mode condition in arm_trbe_alloc_buffer()
>> - Exit arm_trbe_init() when arm64_kernel_unmapped_at_el0() is enabled
>> - Compute trbe_limit before trbe_write to get the updated handle
>> - Added trbe_stop_and_truncate_event()
>> - Dropped trbe_handle_fatal()
>>
>>  Documentation/trace/coresight/coresight-trbe.rst |   39 +
>>  arch/arm64/include/asm/sysreg.h  |1 +
>>  drivers/hwtracing/coresight/Kconfig  |   11 +
>>  drivers/hwtracing/coresight/Makefile |1 +
>>  drivers/hwtracing/coresight/coresight-trbe.c | 1023 
>> ++
>>  drivers/hwtracing/coresight/coresight-trbe.h |  160 
>>  6 files changed, 1235 insertions(+)
>>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
>> b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 000..1cbb819
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,39 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==
>> +Trace Buffer Extension (TRBE).
>> +==
>> +
>> +:Author:   Anshuman Khandual 
>> +:Date: November 2020
>> +
>> +Hardware Description
>> +
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +The TRBE is not compliant to CoreSight architecture specifications, but is
>> +driven via the CoreSight driver framework to support the ETE (which is
>> +CoreSight compliant) integration.
>> +
>> +Sysfs files and directories
>> +---
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +>$ ls /sys/bus/coresight/devices
>> +trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe`` named TRBEs are associated with a CPU.::
>> +
>> +>$ ls /sys/bus/coresight/devices/trbe0/
>> +align dbm
>> +
>> +*Key file items are:-*
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 85ae4db..9e2e9b7 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>  #define SET_PSTATE_UAO(x)   __e

Re: [PATCH V3 11/14] coresight: sink: Add TRBE driver

2021-02-15 Thread Anshuman Khandual



On 2/12/21 10:27 PM, Mathieu Poirier wrote:
> [...]
> 
>>>
>>>
 +  if (nr_pages < 2)
 +  return NULL;
 +
 +  buf = kzalloc_node(sizeof(*buf), GFP_KERNEL, trbe_alloc_node(event));
 +  if (IS_ERR(buf))
 +  return ERR_PTR(-ENOMEM);
 +
 +  pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
 +  if (IS_ERR(pglist)) {
 +  kfree(buf);
 +  return ERR_PTR(-ENOMEM);
 +  }
 +
 +  for (i = 0; i < nr_pages; i++)
 +  pglist[i] = virt_to_page(pages[i]);
 +
 +  buf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, 
 PAGE_KERNEL);
 +  if (IS_ERR((void *)buf->trbe_base)) {
>>>
>>> Why not simply make buf->trbe_base a void * instead of having to do all this
>>
>> There are many arithmetic and comparison operations involving trbe_base
>> element. Hence it might be better to keep it as unsigned long, also to
>> keeps it consistent with other pointers i.e trbe_write, trbe_limit.
> 
> That is a fair point.  Please add a comment to explain your design choice and
> make sure the sparse checker is happy with all of it.

Added a comment.

> 
>>
>> Snippet from $cat drivers/hwtracing/coresight/coresight-trbe.c | grep 
>> "trbe_base"
>> There are just two places type casting trbe_base back to (void *).
>>
>>  memset((void *)buf->trbe_base + head, ETE_IGNORE_PACKET, len);
>>  return buf->trbe_base + offset;
>>  WARN_ON(buf->trbe_write < buf->trbe_base);
>>  set_trbe_base_pointer(buf->trbe_base);
>>  buf->trbe_base = (unsigned long)vmap(pglist, nr_pages, VM_MAP, 
>> PAGE_KERNEL);
>>  if (IS_ERR((void *)buf->trbe_base)) {
>>  return ERR_PTR(buf->trbe_base);
>>  buf->trbe_limit = buf->trbe_base + nr_pages * PAGE_SIZE;
>>  buf->trbe_write = buf->trbe_base;
>>  vunmap((void *)buf->trbe_base);
>>  base = get_trbe_base_pointer();
>>  buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>>  if (buf->trbe_limit == buf->trbe_base) {
>>  buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>>  if (buf->trbe_limit == buf->trbe_base) {
>>  offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
>>  buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>>  if (buf->trbe_limit == buf->trbe_base) {
>>  WARN_ON(buf->trbe_base != get_trbe_base_pointer());
>>  if (get_trbe_write_pointer() == get_trbe_base_pointer())
>>   
>>> casting?  And IS_ERR() doesn't work with vmap().
>>
>> Sure, will drop IS_ERR() here.
>>
> 
> [...]
> 
> 
>>>
 +
 +static ssize_t dbm_show(struct device *dev, struct device_attribute 
 *attr, char *buf)
 +{
 +  struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
 +
 +  return sprintf(buf, "%d\n", cpudata->trbe_dbm);
 +}
 +static DEVICE_ATTR_RO(dbm);
>>>
>>> What does "dbm" stand for?  Looking at the documentation for TRBIDR_EL1.F, I
>>> don't see what "dbm" relates to.
>>
>> I made it up to refer TRBIDR_EL1.F as "Dirty (and Access Flag) Bit 
>> Management".
>> Could change it as "afdbm" to be more specific or if it is preferred.
>>
> 
> I don't see "afdbm" being a better solution - why not simply "flag"?

Replaced all reference for "dbm" with "flag".


Re: [PATCH] arm64: mm: correct the start of physical address in linear map

2021-02-14 Thread Anshuman Khandual
Hello Pavel,

On 2/13/21 6:53 AM, Pavel Tatashin wrote:
> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
> linear map range is not checked correctly.
> 
> The start physical address that linear map covers can be actually at the
> end of the range because of randmomization. Check that and if so reduce it
> to 0.

Looking at the code, this seems possible if memstart_addr which is a signed
value becomes large (after falling below 0) during arm64_memblock_init().

> 
> This can be verified on QEMU with setting kaslr-seed to ~0ul:
> 
> memstart_offset_seed = 0x
> START: __pa(_PAGE_OFFSET(vabits_actual)) = 9000c000
> END:   __pa(PAGE_END - 1) =  1000bfff
> 
> Signed-off-by: Pavel Tatashin 
> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear 
> mapping")
> ---
>  arch/arm64/mm/mmu.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ae0c3d023824..6057ecaea897 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1444,14 +1444,25 @@ static void __remove_pgd_mapping(pgd_t *pgdir, 
> unsigned long start, u64 size)
>  
>  static bool inside_linear_region(u64 start, u64 size)
>  {
> + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
> + u64 end_linear_pa = __pa(PAGE_END - 1);
> +
> + /*
> +  * Check for a wrap, it is possible because of randomized linear mapping
> +  * the start physical address is actually bigger than the end physical
> +  * address. In this case set start to zero because [0, end_linear_pa]
> +  * range must still be able to cover all addressable physical addresses.
> +  */

If this is possible only with randomized linear mapping, could you please
add IS_ENABLED(CONFIG_RANDOMIZED_BASE) during the switch over. Wondering
if WARN_ON(start_linear_pa > end_linear_pa) should be added otherwise i.e
when linear mapping randomization is not enabled.

> + if (start_linear_pa > end_linear_pa)
> + start_linear_pa = 0;

This looks okay but will double check and give it some more testing.

> +
>   /*
>* Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
>* accommodating both its ends but excluding PAGE_END. Max physical
>* range which can be mapped inside this linear mapping range, must
>* also be derived from its end points.
>*/
> - return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
> -(start + size - 1) <= __pa(PAGE_END - 1);
> + return start >= start_linear_pa && (start + size - 1) <= end_linear_pa;
>  }
>  
>  int arch_add_memory(int nid, u64 start, u64 size,
> 

- Anshuman


Re: [PATCH 3/3] dma-contiguous: Type cast MAX_ORDER as unsigned int

2021-02-11 Thread Anshuman Khandual



On 2/11/21 1:34 PM, Christoph Hellwig wrote:
> On Thu, Feb 11, 2021 at 11:52:11AM +0530, Anshuman Khandual wrote:
>> Type cast MAX_ORDER as unsigned int to fix the following build warning.
>>
>> In file included from ./include/linux/kernel.h:14,
>>  from ./include/asm-generic/bug.h:20,
>>  from ./arch/arm64/include/asm/bug.h:26,
>>  from ./include/linux/bug.h:5,
>>  from ./include/linux/mmdebug.h:5,
>>  from ./arch/arm64/include/asm/memory.h:166,
>>  from ./arch/arm64/include/asm/page.h:42,
>>  from kernel/dma/contiguous.c:46:
>> kernel/dma/contiguous.c: In function ‘rmem_cma_setup’:
>> ./include/linux/minmax.h:18:28: warning: comparison of distinct pointer
>> types lacks a cast
>>   (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
>> ^~
>> ./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
>>(__typecheck(x, y) && __no_side_effects(x, y))
>> ^~~
>> ./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
>>   __builtin_choose_expr(__safe_cmp(x, y), \
>> ^~
>> ./include/linux/minmax.h:58:19: note: in expansion of macro
>> ‘__careful_cmp’
>>  #define max(x, y) __careful_cmp(x, y, >)
>>^
>> kernel/dma/contiguous.c:402:35: note: in expansion of macro ‘max’
>>   phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
>>
>> Cc: Christoph Hellwig 
>> Cc: Marek Szyprowski 
>> Cc: Robin Murphy 
>> Cc: io...@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  kernel/dma/contiguous.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
>> index 3d63d91cba5c..1c2782349d71 100644
>> --- a/kernel/dma/contiguous.c
>> +++ b/kernel/dma/contiguous.c
>> @@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
>>  
>>  static int __init rmem_cma_setup(struct reserved_mem *rmem)
>>  {
>> -phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
>> +phys_addr_t align = PAGE_SIZE << max((unsigned int)MAX_ORDER - 1, 
>> pageblock_order);
> 
> MAX_ORDER and pageblock_order should be the same type.  So either fix

Right.

> MAX_ORDER to be an unsigned constant, which would be fundamentally
> the right thing to do but might cause some fallout, or turn
> pageblock_order into an int, which is probably much either as the stub
> define of it already has an integer type derived from MAX_ORDER as well.

Right, will change pageblock_order as 'int' which would be easier.


Re: [PATCH 1/3] mm/page_alloc: Fix pageblock_order when HUGETLB_PAGE_ORDER >= MAX_ORDER

2021-02-11 Thread Anshuman Khandual



On 2/11/21 1:30 PM, Christoph Hellwig wrote:
>> -if (HPAGE_SHIFT > PAGE_SHIFT)
>> +if ((HPAGE_SHIFT > PAGE_SHIFT) && (HUGETLB_PAGE_ORDER < MAX_ORDER))
> 
> No need for the braces.

Will drop them.


Re: [PATCH 2/3] arm64/hugetlb: Enable HUGETLB_PAGE_SIZE_VARIABLE

2021-02-11 Thread Anshuman Khandual



On 2/11/21 1:31 PM, Christoph Hellwig wrote:
> On Thu, Feb 11, 2021 at 11:52:10AM +0530, Anshuman Khandual wrote:
>> MAX_ORDER which invariably depends on FORCE_MAX_ZONEORDER can be a variable
>> for a given page size, depending on whether TRANSPARENT_HUGEPAGE is enabled
>> or not. In certain page size and THP combinations HUGETLB_PAGE_ORDER can be
>> greater than MAX_ORDER, making it unusable as pageblock_order.
>>
>> This enables HUGETLB_PAGE_SIZE_VARIABLE making pageblock_order a variable
>> rather than the compile time constant HUGETLB_PAGE_ORDER which could break
>> MAX_ORDER rule for certain configurations.
>>
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/arm64/Kconfig | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f39568b28ec1..8e3a5578f663 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1909,6 +1909,10 @@ config ARCH_ENABLE_THP_MIGRATION
>>  def_bool y
>>  depends on TRANSPARENT_HUGEPAGE
>>  
>> +config HUGETLB_PAGE_SIZE_VARIABLE
> 
> Please move the definition of HUGETLB_PAGE_SIZE_VARIABLE to
> mm/Kconfig and select it from the arch Kconfigfs instead of duplicating
> the definition.

Sure, will do.


Re: [PATCH 0/3] mm/page_alloc: Fix pageblock_order with HUGETLB_PAGE_SIZE_VARIABLE

2021-02-11 Thread Anshuman Khandual


On 2/11/21 2:07 PM, David Hildenbrand wrote:
> On 11.02.21 07:22, Anshuman Khandual wrote:
>> The following warning gets triggered while trying to boot a 64K page size
>> without THP config kernel on arm64 platform.
>>
>> WARNING: CPU: 5 PID: 124 at mm/vmstat.c:1080 __fragmentation_index+0xa4/0xc0
>> Modules linked in:
>> CPU: 5 PID: 124 Comm: kswapd0 Not tainted 5.11.0-rc6-4-ga0ea7d62002 #159
>> Hardware name: linux,dummy-virt (DT)
>> [    8.810673] pstate: 2045 (nzCv daif +PAN -UAO -TCO BTYPE=--)
>> [    8.811732] pc : __fragmentation_index+0xa4/0xc0
>> [    8.812555] lr : fragmentation_index+0xf8/0x138
>> [    8.813360] sp : 864079b0
>> [    8.813958] x29: 864079b0 x28: 0372
>> [    8.814901] x27: 7682 x26: 8000135b3948
>> [    8.815847] x25: 1fffe00010c80f48 x24: 
>> [    8.816805] x23:  x22: 000d
>> [    8.817764] x21: 0030 x20: 0005ffcb4d58
>> [    8.818712] x19: 000b x18: 
>> [    8.819656] x17:  x16: 
>> [    8.820613] x15:  x14: 8000114c6258
>> [    8.821560] x13: 6000bff969ba x12: 1fffe000bff969b9
>> [    8.822514] x11: 1fffe000bff969b9 x10: 6000bff969b9
>> [    8.823461] x9 : dfff8000 x8 : 0005ffcb4dcf
>> [    8.824415] x7 : 0001 x6 : 41b58ab3
>> [    8.825359] x5 : 600010c80f48 x4 : dfff8000
>> [    8.826313] x3 : 8000102be670 x2 : 0007
>> [    8.827259] x1 : 86407a60 x0 : 000d
>> [    8.828218] Call trace:
>> [    8.828667]  __fragmentation_index+0xa4/0xc0
>> [    8.829436]  fragmentation_index+0xf8/0x138
>> [    8.830194]  compaction_suitable+0x98/0xb8
>> [    8.830934]  wakeup_kcompactd+0xdc/0x128
>> [    8.831640]  balance_pgdat+0x71c/0x7a0
>> [    8.832327]  kswapd+0x31c/0x520
>> [    8.832902]  kthread+0x224/0x230
>> [    8.833491]  ret_from_fork+0x10/0x30
>> [    8.834150] ---[ end trace 472836f79c15516b ]---
>>
>> This warning comes from __fragmentation_index() when the requested order
>> is greater than MAX_ORDER.
>>
>> static int __fragmentation_index(unsigned int order,
>>  struct contig_page_info *info)
>> {
>>  unsigned long requested = 1UL << order;
>>
>>  if (WARN_ON_ONCE(order >= MAX_ORDER)) <= Triggered here
>>  return 0;
>>
>> Digging it further reveals that pageblock_order has been assigned a value
>> which is greater than MAX_ORDER failing the above check. But why this
>> happened ? Because HUGETLB_PAGE_ORDER for the given config on arm64 is
>> greater than MAX_ORDER.
>>
>> The solution involves enabling HUGETLB_PAGE_SIZE_VARIABLE which would make
>> pageblock_order a variable instead of constant HUGETLB_PAGE_ORDER. But that
>> change alone also did not really work as pageblock_order still got assigned
>> as HUGETLB_PAGE_ORDER in set_pageblock_order(). HUGETLB_PAGE_ORDER needs to
>> be less than MAX_ORDER for its appropriateness as pageblock_order otherwise
>> just fallback to MAX_ORDER - 1 as before. While here it also fixes a build
>> problem via type casting MAX_ORDER in rmem_cma_setup().
> 
> I'm wondering, is there any real value in allowing FORCE_MAX_ZONEORDER to be 
> "11" with ARM64_64K_PAGES/ARM64_16K_PAGES?

MAX_ORDER should be as high as would be required for the current config.
Unless THP is enabled, there is no need for it to be any higher than 11.
But I might be missing historical reasons around this as well. Probably
others from arm64 could help here.

> 
> Meaning: are there any real use cases that actually build a kernel without 
> TRANSPARENT_HUGEPAGE and with ARM64_64K_PAGES/ARM64_16K_PAGES?

THP is always optional. Besides kernel builds without THP should always
be supported. Assuming that all builds will have THP enabled, might not
be accurate.

> 
> As builds are essentially broken, I assume this is not that relevant? Or how 
> long has it been broken?

Git blame shows that it's been there for some time now. But how does
that make this irrelevant ? A problem should be fixed nonetheless.

> 
> It might be easier to just drop the "TRANSPARENT_HUGEPAGE" part from the 
> FORCE_MAX_ZONEORDER config.
> 

Not sure if it would be a good idea to unnecessarily have larger MAX_ORDER
value for a given config. But I might be missing other contexts here.


  1   2   3   4   5   6   7   8   9   10   >