On Fri, Dec 02, 2016 at 09:38:38AM -0800, Andy Lutomirski wrote:
> TBH, I didn't start down this path for performance. I did it because
> I wanted to kill off a CPUID that was breaking on old CPUs that don't
> have CPUID. So I propose MOV-to-CR2 followed by an unconditional
> jump. My goal here
On Fri, Dec 2, 2016 at 2:55 PM, Andy Lutomirski wrote:
>>
>> Honestly, I think Intel should clean up their documentation.
>
> I'm not sure I follow. If a user program gets migrated, it might end
> up doing cross-modification when it expects self-modification. If
> that trips the program up, is t
On Fri, Dec 2, 2016 at 1:10 PM, Linus Torvalds
wrote:
> On Fri, Dec 2, 2016 at 12:41 PM, Andy Lutomirski wrote:
>>
>> Because, if so, we should maybe serialize whenever we migrate a
>> process to a different CPU.
>
> The intel docs are bad on this issue.
>
> Technically what we do could fall unde
On Fri, Dec 2, 2016 at 12:41 PM, Andy Lutomirski wrote:
>
> Because, if so, we should maybe serialize whenever we migrate a
> process to a different CPU.
The intel docs are bad on this issue.
Technically what we do could fall under the "cross-modifying code"
case, where one CPU does the write, a
On Fri, Dec 2, 2016 at 11:35 AM, Linus Torvalds
wrote:
> On Fri, Dec 2, 2016 at 11:30 AM, Andy Lutomirski wrote:
>>
>> How's this?
>
> Looks ok. I do think that
>
>> I suppose it could be an unconditional IRET-to-self, but that's a good
>> deal slower and not a whole lot simpler. Although if we
On 12/02/2016 06:44 AM, Andrew Cooper wrote:
> On 02/12/16 00:35, Andy Lutomirski wrote:
>> On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
>> guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
>> serialize on Xen PV, but, in practice, any trap it generates will
>> ser
On Fri, Dec 2, 2016 at 11:30 AM, Andy Lutomirski wrote:
>
> How's this?
Looks ok. I do think that
> I suppose it could be an unconditional IRET-to-self, but that's a good
> deal slower and not a whole lot simpler. Although if we start doing
> it right, performance won't really matter here.
Con
On Dec 2, 2016 10:48 AM, "Boris Ostrovsky" wrote:
>
> On 12/02/2016 06:44 AM, Andrew Cooper wrote:
> > On 02/12/16 00:35, Andy Lutomirski wrote:
> >> On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
> >> guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
> >> serialize
On Fri, Dec 2, 2016 at 11:24 AM, Linus Torvalds
wrote:
> On Fri, Dec 2, 2016 at 11:20 AM, Borislav Petkov wrote:
>>
>> Something like below?
>
> The optimize-nops thing needs it too, I think.
>
> Again, this will never matter in practice (even if somebody has a i486
> s till, the prefetch window
On Fri, Dec 02, 2016 at 11:23:09AM -0800, Andy Lutomirski wrote:
> Not even firmware loading wants it. Firmware loading needs
Microcode...
> specifically cpuid(eax=1). It has nothing to do with serializing
... but yes, of course. NOT sync_core() but CPUID(1).
Thanks!
--
Regards/Gruss,
B
On Fri, Dec 02, 2016 at 11:24:09AM -0800, Linus Torvalds wrote:
> The optimize-nops thing needs it too, I think.
Ah, it is called only from apply_alternatives() but sure, it is safer
this way. Lemme do that and run it through the boxes to see whether
anything catches fire.
> Again, this will neve
On Fri, Dec 2, 2016 at 11:20 AM, Borislav Petkov wrote:
>
> Something like below?
The optimize-nops thing needs it too, I think.
Again, this will never matter in practice (even if somebody has a i486
s till, the prefetch window size is like 16 bytes or something), but
from a documentation standp
On Fri, Dec 2, 2016 at 11:03 AM, Linus Torvalds
wrote:
> On Fri, Dec 2, 2016 at 10:50 AM, Borislav Petkov wrote:
>>
>> Right, we can try to do something like invalidate_icache() or so in
>> there with the JMP so that the BSP refetches modified code and see where
>> it gets us.
>
> I'd really rath
On Fri, Dec 02, 2016 at 11:03:50AM -0800, Linus Torvalds wrote:
> I'd really rather rjust mark it noinline with a comment. That way the
> return from the function acts as the control flow change.
Something like below?
It boots in a guest but that doesn't mean anything.
> 'sync_core()' doesn't he
On Fri, Dec 2, 2016 at 10:50 AM, Borislav Petkov wrote:
>
> Right, we can try to do something like invalidate_icache() or so in
> there with the JMP so that the BSP refetches modified code and see where
> it gets us.
I'd really rather rjust mark it noinline with a comment. That way the
return fro
On Fri, Dec 02, 2016 at 10:27:29AM -0800, Linus Torvalds wrote:
> That is, of course, assuming that there is no really subtle reason
> why that stupid sync_core() is there.
Right, we can try to do something like invalidate_icache() or so in
there with the JMP so that the BSP refetches modified co
On 12/02/2016 06:44 AM, Andrew Cooper wrote:
> On 02/12/16 00:35, Andy Lutomirski wrote:
>> On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
>> guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
>> serialize on Xen PV, but, in practice, any trap it generates will
>> ser
On Fri, Dec 2, 2016 at 10:03 AM, Borislav Petkov wrote:
>
> SNB:
> * before:
> * after:
I suspect it's entirely invisible on raw hardware. But quite possibly
more noticeable in a VM that takes slow faults for every case.
But yes, even there is' probably not *that* noticeable.
I'd prefer to get
On Fri, Dec 02, 2016 at 09:38:38AM -0800, Andy Lutomirski wrote:
> apply_alternatives, unfortunately. It's performance-critical because
> it's intensely stupid and does sync_core() for every single patch.
> Fixing that would be nice, too.
So I did experiment at the time to batch that sync_core()
On Fri, Dec 2, 2016 at 9:38 AM, Andy Lutomirski wrote:
>
> apply_alternatives, unfortunately. It's performance-critical because
> it's intensely stupid and does sync_core() for every single patch.
> Fixing that would be nice, too.
So looking at text_poke_early(), that's very much a case that rea
On Fri, Dec 2, 2016 at 9:32 AM, Linus Torvalds
wrote:
> On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski wrote:
>>
>> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
>> ~110ns. But Xen PV will trap CPUID if possible, so IRET-to-self
>> should end up being a nice speedup.
>
> So if
On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski wrote:
>
> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
> ~110ns. But Xen PV will trap CPUID if possible, so IRET-to-self
> should end up being a nice speedup.
So if we care deeply about the performance of this, we should really
a
On 02/12/16 17:23, Andy Lutomirski wrote:
> On Fri, Dec 2, 2016 at 9:16 AM, Andrew Cooper
> wrote:
>> On 02/12/16 17:07, Andy Lutomirski wrote:
>>> On Dec 2, 2016 3:44 AM, "Andrew Cooper" wrote:
On 02/12/16 00:35, Andy Lutomirski wrote:
> On Xen PV, CPUID is likely to trap, and Xen hype
On Fri, Dec 2, 2016 at 9:16 AM, Andrew Cooper wrote:
> On 02/12/16 17:07, Andy Lutomirski wrote:
>> On Dec 2, 2016 3:44 AM, "Andrew Cooper" wrote:
>>> On 02/12/16 00:35, Andy Lutomirski wrote:
On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
guaranteed to serialize. (Even
On 02/12/16 17:07, Andy Lutomirski wrote:
> On Dec 2, 2016 3:44 AM, "Andrew Cooper" wrote:
>> On 02/12/16 00:35, Andy Lutomirski wrote:
>>> On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
>>> guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
>>> serialize on Xen PV,
On Dec 2, 2016 3:44 AM, "Andrew Cooper" wrote:
>
> On 02/12/16 00:35, Andy Lutomirski wrote:
> > On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
> > guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
> > serialize on Xen PV, but, in practice, any trap it generates wil
On 02/12/16 00:35, Andy Lutomirski wrote:
> On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
> guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
> serialize on Xen PV, but, in practice, any trap it generates will
> serialize.)
Well, Xen will enabled CPUID Faulting whe
On Xen PV, CPUID is likely to trap, and Xen hypercalls aren't
guaranteed to serialize. (Even CPUID isn't *really* guaranteed to
serialize on Xen PV, but, in practice, any trap it generates will
serialize.)
On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
~110ns. But Xen PV will tra
28 matches
Mail list logo