On 08.09.2012 00:52, Yinghai Lu wrote:
> On Fri, Sep 7, 2012 at 2:31 PM, <[email protected]> wrote:
>>
>> The patch titled
>> Subject: x86/mm: limit extra padding calculation to x86_32
>> has been added to the -mm tree. Its filename is
>> x86-mm-limit-2-4m-size-calculation-to-x86_32.patch
>>
>> Before you just go and hit "reply", please:
>> a) Consider who else should be cc'ed
>> b) Prefer to cc a suitable mailing list as well
>> c) Ideally: find the original patch on the mailing list and do a
>> reply-to-all to that, adding suitable additional cc's
>>
>> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>>
>> The -mm tree is included into linux-next and is updated
>> there every 3-4 working days
>>
>> ------------------------------------------------------
>> From: Stefan Bader <[email protected]>
>> Subject: x86/mm: limit extra padding calculation to x86_32
>>
>> Starting with kernel v3.5 kexec based crash dumping was observed to fail
>> (without any apparent message) on x86_64 machines. This was traced to a
>> lack of memory triggered by a substantial increase (several megabyes) in
>> the size of the initial page tables.
>>
>> After regression (on a VM with 2GB of memory):
>> kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fbfd000-0x1fffffff]
>> size = 4206591 bytes
>>
>> With this patch applied:
>> kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fffc000-0x1fffffff]
>> size = 16383 bytes
>>
>> A bisection lead to commit 722bc6b ("x86/mm: Fix the size calculation of
>> mapping tables")
>>
>> This change modified the extra space calculation to take into account that
>> the first 2/4M range of memory would be mapped as 4K pages as suggested in
>> chapter 11.11.9 of the Intel software developer's manual.
>>
>> However this is currently only true for x86_32 (the reasons behind that
>> are unclear but apparently the whole page table setup needs to be re-
>> visited as it turns out to be very easy to break and has flaws in its
>> current form).
>>
>> Until the logic can be revisited and combined, pair up the extra space
>> calculation with the logic which creates the extra mappings.
>>
>> Signed-off-by: Stefan Bader <[email protected]>
>> Cc: WANG Cong <[email protected]>
>> Cc: Yinghai Lu <[email protected]>
>> Cc: Tejun Heo <[email protected]>
>> Cc: <[email protected]> [v3.5+]
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Signed-off-by: Andrew Morton <[email protected]>
>> ---
>>
>> arch/x86/mm/init.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff -puN arch/x86/mm/init.c~x86-mm-limit-2-4m-size-calculation-to-x86_32
>> arch/x86/mm/init.c
>> --- a/arch/x86/mm/init.c~x86-mm-limit-2-4m-size-calculation-to-x86_32
>> +++ a/arch/x86/mm/init.c
>> @@ -60,10 +60,11 @@ static void __init find_early_table_spac
>> extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
>> #ifdef CONFIG_X86_32
>> extra += PMD_SIZE;
>> -#endif
>> +
>> /* The first 2/4M doesn't use large pages. */
>> if (mr->start < PMD_SIZE)
>> extra += mr->end - mr->start;
>> +#endif
>>
>> ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
>> } else
>> _
>>
>> Patches currently in -mm which might be from [email protected] are
>>
>> x86-mm-limit-2-4m-size-calculation-to-x86_32.patch
>>
>
> three lines
> /* The first 2/4M doesn't use large pages. */
> if (mr->start < PMD_SIZE)
> extra += mr->end - mr->start;
>
> should be just dropped even for 32 bit.
These lines are required to be consistent with the code that creates the page
ranges (shown below).
arch/x86/mm/init.c:
#ifdef CONFIG_X86_32
/*
* Don't use a large page for the first 2/4MB of memory
* because there are often fixed size MTRRs in there
* and overlapping MTRRs into large pages can cause
* slowdowns.
*/
if (pos == 0)
end_pfn = 1<<(PMD_SHIFT - PAGE_SHIFT);
else
end_pfn = ((pos + (PMD_SIZE - 1))>>PMD_SHIFT)
<< (PMD_SHIFT - PAGE_SHIFT);
#else /* CONFIG_X86_64 */
end_pfn = ((pos + (PMD_SIZE - 1)) >> PMD_SHIFT)
<< (PMD_SHIFT - PAGE_SHIFT);
#endif
They were introduced by the following two patches in order to adjust the
allocation of the page table space for the special first range of 4k pages.
* x86/mm: Fix the size calculation of mapping tables
* x86/mm: Only add extra pages count for the first memory range during pre-
allocation early page table space
If only those lines would get dropped, we would be exactly where we were before
(which was apparently wrong as well). This patch only tries to make things
consistent before any attempt to re-consider the whole page table allocation
scheme which is something people agree needs doing.
-Stefan
>
> Thanks
>
> Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html