Re: [PATCH v2 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2

2012-08-09 Thread Jan Beulich
>>> On 09.08.12 at 17:03, "Kirill A. Shutemov" >>> wrote: > ENTRY(clear_page_nocache) > CFI_STARTPROC > - xorl %eax,%eax > - movl $4096/64,%ecx > + push %rdi > + call kernel_fpu_begin > + pop%rdi You use CFI annotations elsewhere, so why don't you use pushq

[PATCH v2 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2

2012-08-09 Thread Kirill A. Shutemov
From: Andi Kleen With multiple threads vector stores are more efficient, so use them. This will cause the page clear to run non preemptable and add some overhead. However on 32bit it was already non preempable (due to kmap_atomic) and there is an preemption opportunity every 4K unit. On a NPB (N