Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()

2016-05-29 Thread Aneesh Kumar K.V
Anton Blanchard writes: > From: Anton Blanchard > > In many cases we disable interrupts right before calling > find_linux_pte_or_hugepte(). > > find_linux_pte_or_hugepte() first checks interrupts are disabled > before calling __find_linux_pte_or_hugepte(): > >

Re: [PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64

2016-05-29 Thread Michael Ellerman
On Mon, 2016-05-30 at 09:08 +1000, Anton Blanchard via Linuxppc-dev wrote: > > That is surprising, do we have any idea what specifically increases > > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I > > notice in our io.h for example we still do manual ld/std + swap > >

Re: [PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64

2016-05-29 Thread Anton Blanchard via Linuxppc-dev
Hi Ben, > That is surprising, do we have any idea what specifically increases > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I > notice in our io.h for example we still do manual ld/std + swap > because old processors didn't know these, we should fix that for > CONFIG_POWER8

Re: [PATCH v3 00/16] genrtc removal

2016-05-29 Thread Alexandre Belloni
Hi, On 03/05/2016 at 12:05:34 +0200, Arnd Bergmann wrote : > On Tuesday 03 May 2016 09:24:18 Alexandre Belloni wrote: > > Hi Arnd, > > > > I see you didn't copy Greg on that series (that may explain his > > confusion on the previous patch), do you expect me to take it > > through the RTC tree?

Re: [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()

2016-05-29 Thread Anton Blanchard via Linuxppc-dev
Hi, > On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote: > > From: Anton Blanchard > > > > In setup_sigcontext(), we set current->thread.vrsave then use it > > straight after. Since current is hidden from the compiler via inline > > assembly, it cannot optimise this and

Re: [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()

2016-05-29 Thread Michael Neuling
On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote: > From: Anton Blanchard > > In setup_sigcontext(), we set current->thread.vrsave then use it > straight after. Since current is hidden from the compiler via inline > assembly, it cannot optimise this and we end up with a

Re: [PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64

2016-05-29 Thread Benjamin Herrenschmidt
On Sun, 2016-05-29 at 21:03 +1000, Anton Blanchard wrote: > Hi, > > > > > This enables us to share the same page table code for > > both radix and hash. Radix use a hardware defined big endian > > page table > This is measurably worse (a little over 2% on POWER8) on a futex > microbenchmark:

[PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()

2016-05-29 Thread Anton Blanchard
From: Anton Blanchard In many cases we disable interrupts right before calling find_linux_pte_or_hugepte(). find_linux_pte_or_hugepte() first checks interrupts are disabled before calling __find_linux_pte_or_hugepte(): if (!arch_irqs_disabled()) {

[PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()

2016-05-29 Thread Anton Blanchard
From: Anton Blanchard In setup_sigcontext(), we set current->thread.vrsave then use it straight after. Since current is hidden from the compiler via inline assembly, it cannot optimise this and we end up with a load hit store. Fix this by using a temporary. Signed-off-by:

[PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()

2016-05-29 Thread Anton Blanchard
From: Anton Blanchard In both __giveup_fpu() and __giveup_altivec() we make two modifications to tsk->thread.regs->msr. gcc decides to do a read/modify/write of each change, so we end up with a load hit store: ld r9,264(r10) rldicl r9,r9,50,1

Re: [PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64

2016-05-29 Thread Anton Blanchard via Linuxppc-dev
Hi, > This enables us to share the same page table code for > both radix and hash. Radix use a hardware defined big endian > page table This is measurably worse (a little over 2% on POWER8) on a futex microbenchmark: #define _GNU_SOURCE #include #include #include #define ITERATIONS 1000