Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-19 Thread Andi Kleen
> In particular 1) means that any extra instructions executed/not executed > will cause a replay divergence (in practice rr uses retired conditional > branches rather than instructions, because the instruction counter is > not accurate, while the branch one is). This alone causes a problem > for th

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Keno Fischer
> So, to be useful, this interface needs to be called before an > application can run XGETBV or XSAVE for the first time and caches a > "bad" value. I think that means that it might not be feasible to use > outside of cases where you ptrace() something and inject things before > it has a chance to

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Keno Fischer
> So we're talking about a workaround for broken software. The question > is how wide spread is it? For rr to work, it tries to replicate the process state *exactly*. That means: 1. The same instructions executed in the same order 2. The exact same register state at those instructions 3. The same

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Dave Hansen
On 06/18/2018 07:42 AM, Keno Fischer wrote: >> But, in any case, so how is this supposed to work? >> >> // get features we are disabling into values matching the >> // hardware "init state". >> __asm__("XRSTOR %reg1,%reg2", ...); >> prctl(PRCTL_SET_XCR0, something);

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Dave Hansen
On 06/18/2018 10:22 AM, Keno Fischer wrote: >> No, I'm saying that depending on faults is not a viable solution. We >> are not guaranteed to get faults in all the cases you would need to fix up. >> >> XSAVE*/XRSTOR* are not even *called* in some of those cases. > Ah, my apologies, I was under the

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Keno Fischer
On Mon, Jun 18, 2018 at 12:16 PM, Dave Hansen wrote: > On 06/18/2018 08:13 AM, Keno Fischer wrote: 4) Catch the fault thrown by xsaves/xrestors in this situation, update XCR0, redo the xsaves/restores, put XCR0 back and continue execution after the faulting instruction. >>>

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Andi Kleen
> Unfortunately, that is insufficient. Almost difference in CPU behavior > between the replayer > and the replayee. In particular, the problems for rr here are > 1) xgetbv, which can introduce differing register contents (and if > code branches on this, > potentially differences in control flow

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Dave Hansen
On 06/18/2018 08:13 AM, Keno Fischer wrote: >>> 4) Catch the fault thrown by xsaves/xrestors in this situation, update >>> XCR0, redo the xsaves/restores, put XCR0 back and continue >>> execution after the faulting instruction. >> >> I'm worried about the kernel pieces that go digging in th

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Keno Fischer
>> 4) Catch the fault thrown by xsaves/xrestors in this situation, update >> XCR0, redo the xsaves/restores, put XCR0 back and continue >> execution after the faulting instruction. > > I'm worried about the kernel pieces that go digging in the XSAVE data > getting confused more than the har

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Dave Hansen
On 06/18/2018 07:42 AM, Keno Fischer wrote: >> There's no way this is even close to viable until it has been made to >> cope with holes. > > Ok, it seems to me the first decision is whether it should be allowed > to have the compacted format (with holes) in an in-memory xstate > image. Doing so se

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Keno Fischer
Hi Dave, thank you for your thorough comments, replies inline: On Mon, Jun 18, 2018 at 8:47 AM, Dave Hansen wrote: > On 06/16/2018 05:33 PM, Keno Fischer wrote: >> For my use case, it would be sufficient to simply disallow >> any value of XCR0 with "holes" in it, > But what if the hardware you a

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-18 Thread Dave Hansen
On 06/16/2018 05:33 PM, Keno Fischer wrote: > For my use case, it would be sufficient to simply disallow > any value of XCR0 with "holes" in it, But what if the hardware you are migrating to/from *has* holes? There's no way this is even close to viable until it has been made to cope with holes. F

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-17 Thread Keno Fischer
> Almost difference in CPU behavior > between the replayer and the replayee. Not sure what happened to this sentence. What I meant to say was: Almost any difference in CPU behavior between the replayer and the replayee will cause problems for the determinism of the trace. To elaborate on this, e

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-17 Thread Keno Fischer
> Patch seems pointless if you can already control CPUID, which rr > supports. Just disable AVX512 in CPUID. All code using AVX should check > cpuid (or will fail anyways). Unfortunately, that is insufficient. Almost difference in CPU behavior between the replayer and the replayee. In particular,

Re: [RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-17 Thread Andi Kleen
Patch seems pointless if you can already control CPUID, which rr supports. Just disable AVX512 in CPUID. All code using AVX should check cpuid (or will fail anyways). -Andi

[RFC PATCH] x86/arch_prctl: Add ARCH_SET_XCR0 to mask XCR0 per-thread

2018-06-16 Thread Keno Fischer
From: Keno Fischer The rr (http://rr-project.org/) debugger provides user space record-and-replay functionality by carefully controlling the process environment in order to ensure completely deterministic execution of recorded traces. The recently added ARCH_SET_CPUID arch_prctl allows rr to move