Re: [PATCH 5/7] x86: Introduce x86_merge_dr6()

2023-10-28 Thread Jinoh Kang
On 9/25/23 19:20, Jinoh Kang wrote:
> As an outsider's perspective, I think this kind of thing is where selftests
> really shine.  I got the impression that Xen will need to rely on numerous 
> other
> platform oddities, the documentation of which are often unavailable.
> 
> Of course, adding a whole new test infrastructure in code freeze is not 
> viable.
> Maybe I have missed something, but I only see three paths forward here:
> 
> 1. Reference the most relevant paragraph in SDM/APM, but don't quote it.
>Keep the current explanation, and state that the manual is vague anyway.
> 
> 2. Acknowledge that SDM/APM is incomplete, and completely abandon the manual
>as the *authoritative* source of information.  Perhaps embed a sample test
>program that demonstrates the behavior, if it isn't too long.
> 
> 3. Actually assert in runtime that DR6 behaves as expected.
> 

Just a heads-up; has there been any progress on this part?

Please let me know if you need anything.  I'm happy to help!

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 5/7] x86: Introduce x86_merge_dr6()

2023-09-25 Thread Jinoh Kang
On 9/25/23 16:30, Jan Beulich wrote:
> On 22.09.2023 18:11, Andrew Cooper wrote:
>> On 18/09/2023 12:37 pm, Jan Beulich wrote:
>>> On 15.09.2023 22:36, Andrew Cooper wrote:
>>>> The current logic used to update %dr6 when injecting #DB is buggy.
>>>>
>>>> The SDM/APM documention on %dr6 updates is far from ideal, but does at 
>>>> least
>>>> make clear that it's non-trivial.  The actual behaviour is to overwrite
>>>> B{0..3} and accumulate all other bits.
>>> As mentioned before, I'm okay to ack this patch provided it is explicitly 
>>> said
>>> where the information is coming from.
>>
>> The information *is* coming from the relevant paragraph of the relevant
>> chapters of the relevant manuals.
>>
>> I don't need to teach programmers how to suck eggs.  Nor am I going to
>> quote buggy manuals (corrections are pending for both) as a
>> justification for restating several paragraphs of information as a oneliner.
> 
> Earlier on you said this to my original request:
> 
> 'SDM Vol3 18.2.3 Debug Status Register (DR6) says
> 
>  "Certain debug exceptions may clear bits 0-3. The remaining contents of
>  the DR6 register are never cleared by the processor."'
> 
> "Certain" and "may" do not describe the behavior that your change implements.
> Hence imo there's still a need to clarify where the extra information is
> coming from. Pending corrections are of course appreciated; in case you have
> been told the new wording already, perhaps you could quote that?

As an outsider's perspective, I think this kind of thing is where selftests
really shine.  I got the impression that Xen will need to rely on numerous other
platform oddities, the documentation of which are often unavailable.

Of course, adding a whole new test infrastructure in code freeze is not viable.
Maybe I have missed something, but I only see three paths forward here:

1. Reference the most relevant paragraph in SDM/APM, but don't quote it.
   Keep the current explanation, and state that the manual is vague anyway.

2. Acknowledge that SDM/APM is incomplete, and completely abandon the manual
   as the *authoritative* source of information.  Perhaps embed a sample test
   program that demonstrates the behavior, if it isn't too long.

3. Actually assert in runtime that DR6 behaves as expected.

> 
> Jan

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 4/7] x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead

2023-09-16 Thread Jinoh Kang
On 9/16/23 23:00, Andrew Cooper wrote:
> On 16/09/2023 8:36 am, Jinoh Kang wrote:

(snip)

>> These two hunks look like a behavioral change in singlestep mode.
>>
>> This is actually a fix, assuming the emulator previously did not handle
>> 'rep {in,out}s' in singlestep mode correctly, since it now checks for
>> PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].
> 
> The emulator should handle this correctly already.  I had been meaning
> to test this, but hadn't so far - guess I should fix that.
> 
> x86_emulate.c line 511 in get_rep_prefix() sets max_reps to 1 if
> SingleStepping is active.

Thanks for informing.  Although that EFLAGS.TF check in the macro now
makes me--almost reflexively--imagine all sort of creative pathological
cases, like "mov ss, ax; rep ins"...

> 
> This in turn causes the emulator to use the io_{read,write}() hook
> rather than the rep hook.

Right.  (Frankly that part of code has too many branches to be readable.
Also the "presumably most efficient" part of the comment hints at perf
optimization sans any profiling attempts...)

> 
> This is important, because singlestepping becoming pending is normally
> evaluated at the end of the instruction.  i.e. in this example it
> wouldn't show up in pending_dbg (yet).
> 
> What definitely is broken here is the recognition of a data breakpoint
> on the memory operand of the INS/OUTS instruction, but it's broken
> everywhere else for PV guest emulation too, so needs to go on the TODO list.

(Another thing definitely broken here is the recognition of I/O breakpoints
 post the first iteration.  Maybe it would be beneficial to do differential
 testing between the {read,write}_io slowpath and rep_{in,out}s fastpath.)

> 
>> If this is the case, (at least) this part of the patch looks like a stable
>> candidate.  You might want to edit the commit message to reflect that.
> 
> We're going to try and get all the %dr6 handling issues sorted, then
> decide whether to backport the lot or not.  It will entirely depend on
> how invasive the fixes end up being, but I hope they'll be ok to backport.
> 
>> (Ideally all the HWBP handling should be part of the emulator logic, but
>>  I don't see an easy way to generalize the PV-specific logic.  It could
>>  be its own patch anyway.)
> 
> The emulator does have HWBP handling for HVM guests, because that's
> architectural behaviour to look in the TSS.

I was under such impression since I didn't immediately notice I/O
breakpoint handling in ins/outs path; maybe I haven't looked into it deeper...

> 
> PV guests are the odd-ones-out with non-standard behaviour.
> 
> ~Andrew

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 7/7] x86/pv: Rewrite %dr6 handling

2023-09-16 Thread Jinoh Kang
On 9/16/23 21:56, Andrew Cooper wrote:
>> We don't reset DR6 after reading it, and there is no guarantee that the PV 
>> guest
>> will reset it either, so it doesn't match PENDING_DBG exactly IIRC.
>>
>> On the other hand, nothing will probably care about its double-accumulating
>> quirk except perhaps monitor users.
>>
>> Do we want to document that, shadow DR6 for PV (which seems a little overkill
>> if we don't trap DR6 access from PV already), or just live with that?
> 
> Different DR6's.
> 
> This is Xen responding to a real #DB (most likely from a PV guest, but
> maybe from debugging activity in Xen itself), and in ...
> 
>>>  
>>>  /*
>>>   * At the time of writing (March 2018), on the subject of %dr6:
> 
> ... this piece of context missing from the patch, Xen always writes
> X86_DR6_DEFAULT back in order to clear the sticky bits.

Oh, that'll do.

How come have I not seen this?  Maybe I was in dire need of morning coffee.
I assumed absence just from a missing context...

> 
> This behaviour hasn't changed.  Xen always sees a "clean" DR6 on every
> new #DB.
> 
> For the PV guest, what matters is ...
> 

(snip)

>>> -/* Save debug status register where guest OS can peek at it */
>>> -v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
>>> -v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
>>> -
>>>  if ( guest_kernel_mode(v, regs) && v->domain->debugger_attached )
>>>  {
>>> +/* Save debug status register where gdbsx can peek at it */
>>> +v->arch.dr6 = x86_merge_dr6(v->domain->arch.cpu_policy,
>>> +v->arch.dr6, pending_dbg);
>>>  domain_pause_for_debugger();
>>>  return;
>>>  }
>>>  
>>> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
>>> +pv_inject_DB(pending_dbg);
> 
> ... this, which will merge all new pending bits into the guest's DR6.
> 
> If the guest chooses not to clean out DR6 each time, then it will see
> content accumulate.
> 
> To your earlier point of shadowing, we already have to do that.  DR6 is
> just one of many registers we need to context switch for the vCPU.
> 
> PV guests, being ring-deprivieged, trap into Xen for every DR access,
> and Xen handles every one of their #DB exceptions.  This is one reason
> why I split the work into several parts.  PV guests are easier to get
> sorted properly, and patch 1,2,5,6 are all common improvements relevant
> to HVM too.

That confirms my knowledge.  Honestly if I got such a foolish remark
I would question the reviewer's understanding of PV mode (not that you did
anything wrong).  Sorry for wasting your time.

> 
> ~Andrew

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 7/7] x86/pv: Rewrite %dr6 handling

2023-09-16 Thread Jinoh Kang
On 9/16/23 05:36, Andrew Cooper wrote:
> All #DB exceptions result in an update of %dr6, but this isn't handled
> properly by Xen for any guest type.
> 
> Remove all ad-hoc dr6 handling, leaving it to pv_inject_event() in most cases
> and using the new x86_merge_dr6() helper.
> 
> In do_debug(), swap the dr6 to pending_dbg in order to operate entirely with
> positive polarity.  Among other things, this helps spot RTM/BLD in the
> diagnostic message.
> 
> Drop the unconditional v->arch.dr6 adjustment.  pv_inject_event() performs the
> adjustment in the common case, but retain the prior behaviour if a debugger is
> attached.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> CC: Jinoh Kang 
> 
> v2:
>  * Split pieces out into an earlier patch.
>  * Extend do_debug() to use pending_dbg entirely, rather than have the same
>variable used with different polarity at different times.
> ---
>  xen/arch/x86/pv/emul-priv-op.c  |  5 +
>  xen/arch/x86/pv/emulate.c   |  9 +++--
>  xen/arch/x86/pv/ro-page-fault.c |  4 ++--
>  xen/arch/x86/pv/traps.c | 17 +
>  xen/arch/x86/traps.c| 23 +++
>  5 files changed, 34 insertions(+), 24 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index 2f73ed0682e1..fe2011f28e34 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -1364,10 +1364,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs 
> *regs)
>  ASSERT(!curr->arch.pv.trap_bounce.flags);
>  
>  if ( ctxt.ctxt.retire.pending_dbg )
> -{
> -curr->arch.dr6 |= ctxt.ctxt.retire.pending_dbg | 
> DR_STATUS_RESERVED_ONE;
> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> -}
> +pv_inject_DB(ctxt.ctxt.retire.pending_dbg);
>  
>  /* fall through */
>  case X86EMUL_RETRY:
> diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
> index e7a1c0a2cc4f..29425a0a00ec 100644
> --- a/xen/arch/x86/pv/emulate.c
> +++ b/xen/arch/x86/pv/emulate.c
> @@ -71,10 +71,15 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, 
> unsigned long rip)
>  {
>  regs->rip = rip;
>  regs->eflags &= ~X86_EFLAGS_RF;
> +
>  if ( regs->eflags & X86_EFLAGS_TF )
>  {
> -current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +/*
> + * TODO, this should generally use TF from the start of the
> + * instruction.  It's only a latent bug for now, as this path isn't
> + * used for any instruction which modifies eflags.
> + */
> +pv_inject_DB(X86_DR6_BS);
>  }
>  }
>  
> diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
> index cad28ef928ad..f6bb33556e72 100644
> --- a/xen/arch/x86/pv/ro-page-fault.c
> +++ b/xen/arch/x86/pv/ro-page-fault.c
> @@ -389,8 +389,8 @@ int pv_ro_page_fault(unsigned long addr, struct 
> cpu_user_regs *regs)
>  
>  /* Fallthrough */
>  case X86EMUL_OKAY:
> -if ( ctxt.retire.singlestep )
> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +if ( ctxt.retire.pending_dbg )
> +pv_inject_DB(ctxt.retire.pending_dbg);
>  
>  /* Fallthrough */
>  case X86EMUL_RETRY:
> diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
> index 74f333da7e1c..553b04bca956 100644
> --- a/xen/arch/x86/pv/traps.c
> +++ b/xen/arch/x86/pv/traps.c
> @@ -13,6 +13,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -50,9 +51,9 @@ void pv_inject_event(const struct x86_event *event)
>  tb->cs= ti->cs;
>  tb->eip   = ti->address;
>  
> -if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
> - vector == X86_EXC_PF )
> +switch ( vector | -(event->type == X86_EVENTTYPE_SW_INTERRUPT) )
>  {
> +case X86_EXC_PF:
>  curr->arch.pv.ctrlreg[2] = event->cr2;
>  arch_set_cr2(curr, event->cr2);
>  
> @@ -62,9 +63,17 @@ void pv_inject_event(const struct x86_event *event)
>  error_code |= PFEC_user_mode;
>  
>  trace_pv_page_fault(event->cr2, error_code);
> -}
> -else
> +break;
> +
> +case X86_EXC_DB:
> +curr->arch.dr6 = x86_merge_dr6(curr->domain->arch.cpu_policy,
> +   curr->arch.dr6, event->pending_dbg);
> +/* Fallt

Re: [PATCH 6/7] x86: Extend x86_event with a pending_dbg field

2023-09-16 Thread Jinoh Kang
On 9/16/23 05:36, Andrew Cooper wrote:
> ... using the Intel VMCS PENDING_DBG semantics, and sharing storage with cr2.
> 
> This requires working around anonymous union bugs in obsolete versions of GCC,
> which in turn needs to drop unnecessary const qualifiers.

I'd write this as a inline comment as long as it doesn't make the code
too verbose.

Please disregard this if it doesn't match our coding style.

> 
> Also introduce a pv_inject_DB() wrapper use this field nicely.

Minor nit: Also introduce a pv_inject_DB() wrapper to* use this field nicely.

> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> CC: Jinoh Kang 
> 
> v2:
>  * Split out of prior patch.
> ---
>  xen/arch/x86/include/asm/domain.h  | 18 --
>  xen/arch/x86/include/asm/hvm/hvm.h |  3 ++-
>  xen/arch/x86/x86_emulate/x86_emulate.h |  5 -
>  3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/include/asm/domain.h 
> b/xen/arch/x86/include/asm/domain.h
> index c2d9fc333be5..fd1f306222be 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -729,15 +729,29 @@ static inline void pv_inject_hw_exception(unsigned int 
> vector, int errcode)
>  pv_inject_event();
>  }
>  
> +static inline void pv_inject_DB(unsigned long pending_dbg)
> +{
> +struct x86_event event = {
> +.vector  = X86_EXC_DB,
> +.type= X86_EVENTTYPE_HW_EXCEPTION,
> +.error_code  = X86_EVENT_NO_EC,
> +};
> +
> +event.pending_dbg = pending_dbg;
> +
> +pv_inject_event();
> +}
> +
>  static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
>  {
> -const struct x86_event event = {
> +struct x86_event event = {
>  .vector = X86_EXC_PF,
>  .type = X86_EVENTTYPE_HW_EXCEPTION,
>  .error_code = errcode,
> -.cr2 = cr2,
>  };
>  
> +event.cr2 = cr2;
> +
>  pv_inject_event();
>  }
>  
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
> b/xen/arch/x86/include/asm/hvm/hvm.h
> index 6d53713fc3a9..ea966f4429f9 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -532,9 +532,10 @@ static inline void hvm_inject_page_fault(int errcode, 
> unsigned long cr2)
>  .vector = X86_EXC_PF,
>  .type = X86_EVENTTYPE_HW_EXCEPTION,
>  .error_code = errcode,
> -.cr2 = cr2,
>  };
>  
> +event.cr2 = cr2;
> +
>  hvm_inject_event();
>  }
>  
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index fbc023c37e34..e567a9b635d9 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -78,7 +78,10 @@ struct x86_event {
>  uint8_t   type; /* X86_EVENTTYPE_* */
>  uint8_t   insn_len; /* Instruction length */
>  int32_t   error_code;   /* X86_EVENT_NO_EC if n/a */
> -    unsigned long cr2;  /* Only for X86_EXC_PF h/w exception */
> +union {
> +unsigned long cr2; /* #PF */
> +unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) 
> */
> +};
>  };
>  
>  /*

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 4/7] x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead

2023-09-16 Thread Jinoh Kang
On 9/16/23 05:36, Andrew Cooper wrote:
> @@ -658,7 +660,7 @@ static int cf_check rep_ins(
>  
>  ++*reps;
>  
> -if ( poc->bpmatch || hypercall_preempt_check() )
> +if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>  break;
>  
>  /* x86_emulate() clips the repetition count to ensure we don't wrap. 
> */

(snip)

> @@ -726,7 +729,7 @@ static int cf_check rep_outs(
>  
>  ++*reps;
>  
> -if ( poc->bpmatch || hypercall_preempt_check() )
> +if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>  break;
>  
>  /* x86_emulate() clips the repetition count to ensure we don't wrap. 
> */

These two hunks look like a behavioral change in singlestep mode.

This is actually a fix, assuming the emulator previously did not handle
'rep {in,out}s' in singlestep mode correctly, since it now checks for
PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].

If this is the case, (at least) this part of the patch looks like a stable
candidate.  You might want to edit the commit message to reflect that.

(Ideally all the HWBP handling should be part of the emulator logic, but
 I don't see an easy way to generalize the PV-specific logic.  It could
 be its own patch anyway.)

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-15 Thread Jinoh Kang
On 9/15/23 21:20, Jinoh Kang wrote:
> On 9/13/23 08:21, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
>> b/xen/arch/x86/x86_emulate/x86_emulate.h
>> index 698750267a90..f0e74d23c378 100644
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>>  unsigned int opcode;
>>  
>> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). 
>> */
>> +/*
>> + * Retirement state, set by the emulator (valid only on 
>> X86EMUL_OKAY/DONE).
>> + *
>> + * TODO: all this state should be input/output from the VMCS 
>> PENDING_DBG,
>> + * INTERRUPTIBILITY and ACTIVITIY fields.
>> + */
>>  union {
>> -uint8_t raw;
>> +unsigned long raw;
> 
> Minor nit: this should be uint64_t for clarity.  Otherwise, it's not at all
> clear that the raw field covers the entire union, unless you remind myself
> that Xen does not support 32-bit host.

you remind yourself*.  What a weird typo to make :-(

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-15 Thread Jinoh Kang
On 9/13/23 08:21, Andrew Cooper wrote:
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index 698750267a90..f0e74d23c378 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>  unsigned int opcode;
>  
> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
> +/*
> + * Retirement state, set by the emulator (valid only on 
> X86EMUL_OKAY/DONE).
> + *
> + * TODO: all this state should be input/output from the VMCS PENDING_DBG,
> + * INTERRUPTIBILITY and ACTIVITIY fields.
> + */
>  union {
> -uint8_t raw;
> +unsigned long raw;

Minor nit: this should be uint64_t for clarity.  Otherwise, it's not at all
clear that the raw field covers the entire union, unless you remind myself
that Xen does not support 32-bit host.

>  struct {
> +/*
> + * Accumulated %dr6 trap bits, positive polarity.  Should only be
> + * interpreted in the case of X86EMUL_OKAY/DONE.
> + */
> +unsigned int pending_dbg;
> +
>  bool hlt:1;  /* Instruction HLTed. */
>  bool mov_ss:1;   /* Instruction sets MOV-SS irq shadow. */
>  bool sti:1;  /* Instruction sets STI irq shadow. */
>  bool unblock_nmi:1;  /* Instruction clears NMI blocking. */
> -bool singlestep:1;   /* Singlestepping was active. */
> +bool singlestep:1;   /* Singlestepping was active. (TODO, merge 
> into pending_dbg) */
>  };
>  } retire;
>  

-- 
Sincerely,
Jinoh Kang




Re: [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()

2023-08-30 Thread Jinoh Kang
On 8/30/23 22:41, Jan Beulich wrote:
> On 24.08.2023 17:25, Jinoh Kang wrote:
>> Prepare for an upcoming patch that overloads the 'cr2' field for #DB.
> 
> Seeing the subsequent change and the fact that earlier on Andrew didn't
> need such an adjustment, I'm afraid I can't see the need for this change,
> and the one sentence above also doesn't answer the "Why?", but only the
> "What?"

This is part of the hvm_monitor_interrupt() fix (patch 4), which would
otherwise get CR2 value (instead of PENDING_DBG) even for #DB.

I might have been overzealous, though, since there is no known (broken)
use for VM_EVENT_REASON_INTERRUPT in the first place.

> 
> Jan
> 
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
>>  
>>  static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
>>  {
>> -info->cr2 = v->arch.hvm.guest_cr[2];
>> +if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
>> +return false;
>> +
>> +if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
>> + info->vector == X86_EXC_PF )
>> +info->cr2 = v->arch.hvm.guest_cr[2];
>>  
>> -return alternative_call(hvm_funcs.get_pending_event, v, info);
>> +return true;
>>  }
>>  
>>  void hvm_do_resume(struct vcpu *v)
> 

-- 
Sincerely,
Jinoh Kang




[PATCH v2 7/8] x86: Fix merging of new status bits into %dr6

2023-08-24 Thread Jinoh Kang
From: Andrew Cooper 

The current logic used to update %dr6 when injecting #DB is buggy.  The
architectural behaviour is to overwrite B{0..3} (rather than accumulate) and
accumulate all other bits.

Introduce a new merge_dr6() helper, which also takes care of handing RTM
correctly.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[ jinoh: Rebase onto staging, move constants to x86-defns.h ]
Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 

v1 -> v2: [S-o-b fixes.]
---
 xen/arch/x86/hvm/svm/svm.c   |  3 ++-
 xen/arch/x86/hvm/vmx/vmx.c   |  3 ++-
 xen/arch/x86/include/asm/debugreg.h  | 20 +++-
 xen/arch/x86/include/asm/x86-defns.h |  6 ++
 xen/arch/x86/pv/traps.c  |  3 ++-
 5 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 7bb572e72b..c92b2d7f86 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1338,7 +1338,8 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
  * Item 2 is done by hardware when injecting a #DB exception.
  */
 __restore_debug_registers(vmcb, curr);
-vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
+vmcb_set_dr6(vmcb, merge_dr6(vmcb_get_dr6(vmcb), _event.pending_dbg,
+ curr->domain->arch.cpuid->feat.rtm));
 
 /* fall through */
 case X86_EXC_BP:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b35278992a..377f33d632 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2030,7 +2030,8 @@ static void cf_check vmx_inject_event(const struct 
x86_event *event)
  * All actions are left up to the hypervisor to perform.
  */
 __restore_debug_registers(curr);
-write_debugreg(6, read_debugreg(6) | event->pending_dbg);
+write_debugreg(6, merge_dr6(read_debugreg(6), event->pending_dbg,
+curr->domain->arch.cpuid->feat.rtm));
 
 if ( !nestedhvm_vcpu_in_guestmode(curr) ||
  !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) )
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index f83b1b96ec..5fdd25d238 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -22,7 +22,6 @@
 #define DR_STEP (0x4000)/* single-step */
 #define DR_SWITCH   (0x8000)/* task switch */
 #define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ONE  0x0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
The top two bytes of the control register consist of 4 fields of 4
@@ -89,6 +88,25 @@ static inline unsigned long adjust_dr6_rsvd(unsigned long 
dr6, bool rtm)
 return dr6;
 }
 
+static inline unsigned long merge_dr6(unsigned long dr6, unsigned long new,
+  bool rtm)
+{
+/* Flip dr6 to have positive polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+/* Sanity check that only known values are passed in. */
+ASSERT(!(dr6 & ~X86_DR6_KNOWN_MASK));
+ASSERT(!(new & ~X86_DR6_KNOWN_MASK));
+
+/* Breakpoints 0-3 overridden.  BD, BS, BT and RTM accumulate. */
+dr6 = (dr6 & ~X86_DR6_BP_MASK) | new;
+
+/* Flip dr6 back to having default polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+return adjust_dr6_rsvd(dr6, rtm);
+}
+
 static inline unsigned long adjust_dr7_rsvd(unsigned long dr7, bool rtm)
 {
 dr7 |= X86_DR7_MBS;
diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index b13ca680c2..6d76d5dcc5 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -118,6 +118,12 @@
 #define X86_DR6_MBS_BASE(0xfffe0ff0UL)  /* Reserved, read as one   */
 #define X86_DR6_MBS_NO_RTM  (X86_DR6_RTM)   /* - if RTM unavailable*/
 
+#define X86_DR6_BP_MASK \
+(X86_DR6_B0 | X86_DR6_B1 | X86_DR6_B2 | X86_DR6_B3)
+
+#define X86_DR6_KNOWN_MASK  \
+(X86_DR6_BP_MASK | X86_DR6_BD | X86_DR6_BS | X86_DR6_BT | X86_DR6_RTM)
+
 #define X86_DR6_DEFAULT 0x0ff0  /* Default %dr6 value. */
 
 /*
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 4f5641a47c..65b41e6115 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -66,7 +66,8 @@ void pv_inject_event(const struct x86_event *event)
 break;
 
 case X86_EXC_DB:
-curr->arch.dr6 |= event->pending_dbg;
+curr->arch.dr6 = merge_dr6(curr->arch.dr6, event->pending_dbg,
+   curr-&

[PATCH v2 8/8] x86/dbg: Cleanup of legacy dr6 constants

2023-08-24 Thread Jinoh Kang
From: Andrew Cooper 

Replace the few remaining uses with X86_DR6_* constants.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
Acked-by: Jan Beulich 
[ jinoh: Rebase onto staging ]
Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

v1 -> v2: [S-o-b fixes.]
---
 xen/arch/x86/hvm/vmx/vmx.c  |  2 +-
 xen/arch/x86/include/asm/debugreg.h | 20 
 xen/arch/x86/pv/emul-priv-op.c  |  2 +-
 xen/arch/x86/traps.c|  2 +-
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 377f33d632..814f48ce83 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4290,7 +4290,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 __vmread(GUEST_PENDING_DBG_EXCEPTIONS, _dbg);
 __vmwrite(GUEST_PENDING_DBG_EXCEPTIONS,
-  pending_dbg | DR_STEP);
+  pending_dbg | X86_DR6_BS);
 }
 }
 
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index 5fdd25d238..edff379d49 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -3,26 +3,6 @@
 
 #include 
 
-/* Indicate the register numbers for a number of the specific
-   debug registers.  Registers 0-3 contain the addresses we wish to trap on */
-
-#define DR_FIRSTADDR 0
-#define DR_LASTADDR  3
-#define DR_STATUS6
-#define DR_CONTROL   7
-
-/* Define a few things for the status register.  We can use this to determine
-   which debugging register was responsible for the trap.  The other bits
-   are either reserved or not of interest to us. */
-
-#define DR_TRAP0(0x1)   /* db0 */
-#define DR_TRAP1(0x2)   /* db1 */
-#define DR_TRAP2(0x4)   /* db2 */
-#define DR_TRAP3(0x8)   /* db3 */
-#define DR_STEP (0x4000)/* single-step */
-#define DR_SWITCH   (0x8000)/* task switch */
-#define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-
 /* Now define a bunch of things for manipulating the control register.
The top two bytes of the control register consist of 4 fields of 4
bits - each field corresponds to one of the four debug registers,
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 72d0514e74..78a1f4aff7 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1359,7 +1359,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
 {
 case X86EMUL_OKAY:
 if ( ctxt.ctxt.retire.singlestep )
-ctxt.bpmatch |= DR_STEP;
+ctxt.bpmatch |= X86_DR6_BS;
 
 if ( ctxt.bpmatch &&
  !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e2acfbcb9e..ae0a4a1c1e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1955,7 +1955,7 @@ void do_debug(struct cpu_user_regs *regs)
  * If however we do, safety measures need to be enacted.  Use a big
  * hammer and clear all debug settings.
  */
-if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
+if ( dr6 & X86_DR6_BP_MASK )
 {
 unsigned int bp, dr7 = read_debugreg(7);
 
-- 
2.41.0




[PATCH v2 6/8] x86/hvm: Add comments about #DB exception behavior to {svm,vmx}_inject_event()

2023-08-24 Thread Jinoh Kang
From: Andrew Cooper 

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 

Extracted comments only, and then s/from emulation/from monitor/;
originally "x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor
actions to {svm,vmx}_inject_event()"

Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 

v1 -> v2: new patch
---
 xen/arch/x86/hvm/svm/svm.c | 9 +
 xen/arch/x86/hvm/vmx/vmx.c | 7 +++
 2 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 6f3e6b3512..7bb572e72b 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1328,6 +1328,15 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case X86_EXC_DB:
+/*
+ * On AMD hardware, a #DB exception:
+ *  1) Merges new status bits into %dr6
+ *  2) Clears %dr7.gd and MSR_DEBUGCTL.{LBR,BTF}
+ *
+ * Item 1 is done by hardware before a #DB intercepted vmexit, but we
+ * may end up here from monitor so have to repeat it ourselves.
+ * Item 2 is done by hardware when injecting a #DB exception.
+ */
 __restore_debug_registers(vmcb, curr);
 vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4e20fca43e..b35278992a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2022,6 +2022,13 @@ static void cf_check vmx_inject_event(const struct 
x86_event *event)
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case X86_EXC_DB:
+/*
+ * On Intel hardware, a #DB exception:
+ *  1) Merges new status bits into %dr6
+ *  2) Clears %dr7.gd and MSR_DEBUGCTL.LBR
+ *
+ * All actions are left up to the hypervisor to perform.
+ */
 __restore_debug_registers(curr);
 write_debugreg(6, read_debugreg(6) | event->pending_dbg);
 
-- 
2.41.0




[PATCH v2 4/8] x86/emul: Populate pending_dbg field of x86_event from {svm,vmx}_get_pending_event()

2023-08-24 Thread Jinoh Kang
Ensure that we pass the correct pending_dbg value to
hvm_monitor_interrupt().

Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Tamas K Lengyel 
CC: Alexandru Isaila 
CC: Petre Pircalabu 

v1 -> v2: new patch
---
 xen/arch/x86/hvm/svm/svm.c | 8 
 xen/arch/x86/hvm/vmx/vmx.c | 8 
 2 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 3d0402cb10..038c8d6e7e 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2422,6 +2422,14 @@ static bool cf_check svm_get_pending_event(
 info->type = vmcb->event_inj.type;
 info->error_code = vmcb->event_inj.ec;
 
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_DB )
+{
+unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+vmcb_get_dr6(vmcb) : v->arch.dr6;
+info->pending_dbg = dr6 ^ X86_DR6_DEFAULT;
+}
+
 return true;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9c92d2be92..9b59374258 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2469,6 +2469,14 @@ static bool cf_check vmx_get_pending_event(
 info->type = MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK);
 info->error_code = error_code;
 
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_DB )
+{
+unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+read_debugreg(6) : v->arch.dr6;
+info->pending_dbg = dr6 ^ X86_DR6_DEFAULT;
+}
+
 return true;
 }
 
-- 
2.41.0




[PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event

2023-08-24 Thread Jinoh Kang
From: Andrew Cooper 

All #DB exceptions result in an update of %dr6, but this isn't captured in
Xen's handling.

PV guests generally work by modifying %dr6 before raising #DB, whereas HVM
guests do nothing and have a single-step special case in the lowest levels of
{vmx,svm}_inject_event().  All of this is buggy, but in particular, task
switches with the trace flag never end up signalling BT in %dr6.

To begin resolving this issue, add a new pending_dbg field to x86_event
(unioned with cr2 to avoid taking any extra space), and introduce
{pv,hvm}_inject_debug_exn() helpers to replace the current callers using
{pv,hvm}_inject_hw_exception().

A key property is that pending_dbg is taken with positive polarity to deal
with RTM sensibly.  Most callers pass in a constant, but callers passing in a
hardware %dr6 value need to xor the value with X86_DR6_DEFAULT to flip the
polarity of RTM and reserved fields.

For PV guests, move the ad-hoc updating of %dr6 into pv_inject_event().  This
in principle breaks the handing of RTM in do_debug(), but PV guests can't
actually enable MSR_DEBUGCTL.RTM yet, so this doesn't matter in practice.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[ jinoh: Rebase onto staging, forward declare struct vcpu ]
Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Tim Deegan 
CC: Tamas K Lengyel 
CC: Alexandru Isaila 
CC: Petre Pircalabu 

v1 -> v2: [S-o-b fixes. More details below.]

- Update DR6 for gdbsx when trapped in PV guest kernel mode
---
 xen/arch/x86/hvm/emulate.c |  3 ++-
 xen/arch/x86/hvm/hvm.c |  2 +-
 xen/arch/x86/hvm/svm/svm.c |  9 ++---
 xen/arch/x86/hvm/vmx/vmx.c | 13 -
 xen/arch/x86/include/asm/debugreg.h|  3 +++
 xen/arch/x86/include/asm/domain.h  | 12 
 xen/arch/x86/include/asm/hvm/hvm.h | 15 ++-
 xen/arch/x86/mm/shadow/multi.c |  5 +++--
 xen/arch/x86/pv/emul-priv-op.c | 11 +--
 xen/arch/x86/pv/emulate.c  |  6 ++
 xen/arch/x86/pv/ro-page-fault.c|  3 ++-
 xen/arch/x86/pv/traps.c| 16 
 xen/arch/x86/traps.c   | 10 +-
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 -
 14 files changed, 79 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9b6e4c8bc6..129403ad90 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2673,7 +2674,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
*hvmemul_ctxt,
 }
 
 if ( hvmemul_ctxt->ctxt.retire.singlestep )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BS);
 
 new_intr_shadow = hvmemul_ctxt->intr_shadow;
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c726947ccb..f795ef9bc7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3234,7 +3234,7 @@ void hvm_task_switch(
 }
 
 if ( (tss.trace & 1) && !exn_raised )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BT);
 
  out:
 hvm_unmap_entry(optss_desc);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index beb076ea8d..3d0402cb10 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -96,7 +96,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned 
int inst_len)
 curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
 
 if ( regs->eflags & X86_EFLAGS_TF )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BS);
 }
 
 static void cf_check svm_cpu_down(void)
@@ -2755,7 +2755,10 @@ void svm_vmexit_handler(void)
 goto unexpected_exit_type;
 if ( !rc )
 hvm_inject_exception(X86_EXC_DB,
- trap_type, insn_len, X86_EVENT_NO_EC);
+ trap_type, insn_len, X86_EVENT_NO_EC,
+ exit_reason == VMEXIT_ICEBP ? 0 :
+ /* #DB - Hardware already updated dr6. */
+ vmcb_get_dr6(vmcb) ^ X86_DR6_DEFAULT);
 }
 else
 domain_pause_for_debugger();
@@ -2785,7 +2788,7 @@ void svm_vmexit_handler(void)
if ( !rc )
hvm_inject_exception(X86_EXC_BP,
 X86_EVENTTYPE_SW_EXCEPTION,
-insn_len, X86_EVENT_NO_EC);
+insn_len, X86_EVENT_NO_EC, 0 /* N/A */);
 }
 break;
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/

[PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()

2023-08-24 Thread Jinoh Kang
Prepare for an upcoming patch that overloads the 'cr2' field for #DB.

Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Tamas K Lengyel 
CC: Alexandru Isaila 
CC: Petre Pircalabu 
---
 xen/arch/x86/hvm/hvm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 66ead0b878..c726947ccb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
 {
-info->cr2 = v->arch.hvm.guest_cr[2];
+if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
+return false;
+
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_PF )
+info->cr2 = v->arch.hvm.guest_cr[2];
 
-return alternative_call(hvm_funcs.get_pending_event, v, info);
+return true;
 }
 
 void hvm_do_resume(struct vcpu *v)
-- 
2.41.0




[PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits

2023-08-24 Thread Jinoh Kang
From: Andrew Cooper 

The reserved bit calculations for %dr6 and %dr7 depend on whether the VM has
the Restricted Transactional Memory feature available.

Introduce adjust_dr{6,7}_rsvd() and replace the opencoded logic and constants
(except for DR_STATUS_RESERVED_ONE which is (mis)used elsewhere and will be
removed after future bugfixes).  The use of these helpers in set_debugreg()
covers toolstack values for PV guests, but HVM guests need similar treatment.

The use of the guest's cpu_policy is less than optimal in the create/restore
paths.  However in such cases, the policy will be the guest maximum policy,
which will be more permissive with respect to the RTM feature.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
[ jinoh: Rebase onto staging, along with some fixes ]
Signed-off-by: Jinoh Kang 
---
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

v1 -> v2: [S-o-b fixes. More details below.]

- Fix must-be-zero constant in adjust_dr7_rsvd: 0x23ff -> 0x2fff
  - Bit 10 was not set, causing DR7 reserved-1 bit 10 to be unset
  - Bit 11 was not set, causing DR7 RTM-enable bit 11 to be ignored

- Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
  naked numbers (thanks Jan)

- [Commit body]: s/Transnational/Transactional/g (thanks Jan)

- [Commit body]: s/guests cpuid policy/guest's cpu_policy/g (by rebase)
---
 xen/arch/x86/domain.c|  7 +++--
 xen/arch/x86/hvm/hvm.c   |  6 ++--
 xen/arch/x86/include/asm/debugreg.h  | 20 --
 xen/arch/x86/include/asm/x86-defns.h | 41 
 xen/arch/x86/pv/misc-hypercalls.c| 16 +++
 5 files changed, 70 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe86a7f853..a39710b5af 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1053,6 +1053,7 @@ int arch_set_info_guest(
 struct vcpu *v, vcpu_guest_context_u c)
 {
 struct domain *d = v->domain;
+const struct cpu_policy *cp = d->arch.cpuid;
 unsigned int i;
 unsigned long flags;
 bool compat;
@@ -1165,10 +1166,10 @@ int arch_set_info_guest(
 
 if ( is_hvm_domain(d) )
 {
-for ( i = 0; i < ARRAY_SIZE(v->arch.dr); ++i )
+for ( i = 0; i < ARRAY_SIZE(v->arch.dr) - 2; ++i )
 v->arch.dr[i] = c(debugreg[i]);
-v->arch.dr6 = c(debugreg[6]);
-v->arch.dr7 = c(debugreg[7]);
+v->arch.dr6 = adjust_dr6_rsvd(c(debugreg[6]), cp->feat.rtm);
+v->arch.dr7 = adjust_dr7_rsvd(c(debugreg[7]), cp->feat.rtm);
 
 if ( v->vcpu_id == 0 )
 d->vm_assist = c.nat->vm_assist;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a99c0ff20..66ead0b878 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -985,6 +986,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain 
*d)
 
 static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t 
*h)
 {
+const struct cpu_policy *cp = d->arch.cpuid;
 unsigned int vcpuid = hvm_load_instance(h);
 struct vcpu *v;
 struct hvm_hw_cpu ctxt;
@@ -1166,8 +1168,8 @@ static int cf_check hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 v->arch.dr[1] = ctxt.dr1;
 v->arch.dr[2] = ctxt.dr2;
 v->arch.dr[3] = ctxt.dr3;
-v->arch.dr6   = ctxt.dr6;
-v->arch.dr7   = ctxt.dr7;
+v->arch.dr6   = adjust_dr6_rsvd(ctxt.dr6, cp->feat.rtm);
+v->arch.dr7   = adjust_dr7_rsvd(ctxt.dr7, cp->feat.rtm);
 
 hvmemul_cancel(v);
 
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index 86aa6d7143..74344555d2 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -1,6 +1,7 @@
 #ifndef _X86_DEBUGREG_H
 #define _X86_DEBUGREG_H
 
+#include 
 
 /* Indicate the register numbers for a number of the specific
debug registers.  Registers 0-3 contain the addresses we wish to trap on */
@@ -21,7 +22,6 @@
 #define DR_STEP (0x4000)/* single-step */
 #define DR_SWITCH   (0x8000)/* task switch */
 #define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ZERO (~0xefffUL) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0x0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
@@ -61,8 +61,6 @@
We can slow the instruction pipeline for instructions coming via the
gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
-#define DR_CONTROL_RESERVED_ZERO (~0x27ffUL) /* Reserved, read as zero */
-#define DR_CONTROL_RESERVED_ONE  (0x0400UL) /* Reserved, read as one */
 #define DR_LOCAL_EXACT_ENABLE(0x0100UL) /* Local exa

[PATCH v2 0/8] Fixes to debugging facilities

2023-08-24 Thread Jinoh Kang
nservative approach
- Add *_get_pending_event fixes (for hvm_monitor_interrupt)
- Fix must-be-zero constant in adjust_dr7_rsvd: 0x23ff -> 0x2fff
- Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
  naked numbers (thanks Jan)
- Update DR6 for gdbsx when trapped in PV guest kernel mode
- Commit message fixes

Andrew Cooper (5):
  x86: Fix calculation of %dr6/7 reserved bits
  x86/emul: Add pending_dbg field to x86_event
  x86/hvm: Add comments about #DB exception behavior to
{svm,vmx}_inject_event()
  x86: Fix merging of new status bits into %dr6
  x86/dbg: Cleanup of legacy dr6 constants

Jinoh Kang (3):
  x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  x86/emul: Populate pending_dbg field of x86_event from
{svm,vmx}_get_pending_event()
  x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is
set

 xen/arch/x86/domain.c  |  7 +--
 xen/arch/x86/hvm/emulate.c |  3 +-
 xen/arch/x86/hvm/hvm.c | 17 +--
 xen/arch/x86/hvm/svm/svm.c | 35 ++
 xen/arch/x86/hvm/vmx/vmx.c | 45 +++---
 xen/arch/x86/include/asm/debugreg.h| 63 --
 xen/arch/x86/include/asm/domain.h  | 12 +
 xen/arch/x86/include/asm/hvm/hvm.h | 15 +-
 xen/arch/x86/include/asm/x86-defns.h   | 47 +++
 xen/arch/x86/mm/shadow/multi.c |  5 +-
 xen/arch/x86/pv/emul-priv-op.c | 13 +++---
 xen/arch/x86/pv/emulate.c  |  6 +--
 xen/arch/x86/pv/misc-hypercalls.c  | 16 ++-
 xen/arch/x86/pv/ro-page-fault.c|  3 +-
 xen/arch/x86/pv/traps.c| 17 +--
 xen/arch/x86/traps.c   | 12 ++---
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 +-
 17 files changed, 225 insertions(+), 96 deletions(-)

-- 
2.41.0




Re: [PATCH 1/5] x86: Fix calculation of %dr6/7 reserved bits

2023-08-22 Thread Jinoh Kang
On 8/22/23 23:54, Jan Beulich wrote:
> But that's not the only change that was requested back then. There was
> one aspect Andrew didn't like, so leaving that part as is would be
> fine. But for the items he didn't further respond to, I'd expect a
> re-submission to take care of them.

Somehow I assumed that Andrew's branch had already contained the relevant
updates, which was clearly wrong and was a sloppy thinking on my part.
Also, rushing to submission didn't _really_ help.

Again, apologies for wasting your time.

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 1/5] x86: Fix calculation of %dr6/7 reserved bits

2023-08-22 Thread Jinoh Kang
On 8/22/23 23:54, Jan Beulich wrote:
> On 21.08.2023 18:12, Jinoh Kang wrote:
>> On 8/22/23 00:56, Jinoh Kang wrote:
>>> From: Andrew Cooper 
>>>
>>> The reserved bit calculations for %dr6 and %dr7 depend on whether the VM has
>>> the Restricted Transnational Memory feature available.
>>
>> s/Transnational/Transactional/.
>>
>> It was in the original review, but I missed the change.  Apologies.
> 
> But that's not the only change that was requested back then. There was
> one aspect Andrew didn't like, so leaving that part as is would be
> fine. But for the items he didn't further respond to, I'd expect a
> re-submission to take care of them. Plus, if you didn't address
> anything, you would want to (a) mention that

Right.  I'll try to address the rest and provide rationale for those that
aren't feasible.

> and (b) take Roger's R-b
> that was provided at the time (unless of course re-basing was resulting
> in massive changes).

There were many changes, some arguably cosmetic and some not-so-syntactic
ones.  Quite a long time has been passed, so I was unsure if the R-b's
held up to the present moment.

> 
> As I expect the same to be the case for the other patches, I'm not sure
> it's worth looking at them (again).

ACK.  Please consider this patchset withdrawn.

> 
> Jan

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 0/5] Fixes to debugging facilities

2023-08-22 Thread Jinoh Kang
On 8/22/23 15:16, Jan Beulich wrote:
> One important formal question: Where did Andrew's S-o-b go on all of the
> patches?

Thanks for catching it.  Looks like I confused the submission process with that
of another (non-LF) project.  I'll re-read the docs to see if I missed something
else w.r.t. tagging rules.

> 
> Jan

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 3/5] x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor actions to {svm,vmx}_inject_event()

2023-08-22 Thread Jinoh Kang
On 8/22/23 01:34, Tamas K Lengyel wrote:
> On Mon, Aug 21, 2023 at 5:57 PM Jinoh Kang  wrote:
>> This is RFC because it probably breaks introspection, as injection replies
>> from the introspection engine will (probably, but I haven't confirmed) 
>> trigger
>> new monitor events.
>>
>> First of all, monitoring emulated debug events is a change in behaviour,
>> although IMO it is more of a bugfix than a new feature.  Also, similar 
>> changes
>> will happen to other monitored events as we try to unify the
>> intercept/emulation paths.
>>
>> As for the recursive triggering of monitor events, I was considering 
>> extending
>> the monitor infrastructure to have a "in monitor reply" boolean which causes
>> hvm_monitor_debug() to ignore the recursive request.
>>
>> Does this plan sound ok, or have I missed something more subtle with monitor
>> handling?
> 
> Sorry but from a monitor perspective this is a no-go. AFAIU injection
> will only happen once the vCPU gets rescheduled after the exit, which
> may take a long time, introducing potentially significant delay.

AFAIU hvm_monitor_debug() call still happens *before* VMEntry when the softirq
processing and actual injection takes place.  vmx_inject_event() is responsible
for queueing an event to the inject slot, not processing what is already
injected.

> The monitor agent itself may need to inject events or alter what gets
> injected, so now it may need to undo the injection of the event from
> where it was called from - if that's even possible.

The monitor still gets the chance to drop the injection; 'rc > 0' from
hvm_monitor_debug() still results in the injection request simply being
ignored.

> I also find the
> change in behavior whereby we would be getting events for emulated
> events on the monitor ring equiavelly problematic, making it very hard
> to reason about the state of the VM on the monitor side. If that could
> be made optional that a monitor user can subscribe to that would be
> fine, but doing so unconditionally is also no-go.

We could use an additional field to 'struct x86_event' to indicate the
source of the injected exception--be it VMExit, emulator, or
monitor-triggered events.  I'd like to hear about Andrew's opinion on
this first, though.

> 
> Tamas

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 1/5] x86: Fix calculation of %dr6/7 reserved bits

2023-08-21 Thread Jinoh Kang
On 8/22/23 00:56, Jinoh Kang wrote:
> From: Andrew Cooper 
> 
> The reserved bit calculations for %dr6 and %dr7 depend on whether the VM has
> the Restricted Transnational Memory feature available.

s/Transnational/Transactional/.

It was in the original review, but I missed the change.  Apologies.

-- 
Sincerely,
Jinoh Kang




[PATCH 2/5] x86/emul: Add pending_dbg field to x86_event

2023-08-21 Thread Jinoh Kang
From: Andrew Cooper 

All #DB exceptions result in an update of %dr6, but this isn't captured in
Xen's handling.

PV guests generally work by modifying %dr6 before raising #DB, whereas HVM
guests do nothing and have a single-step special case in the lowest levels of
{vmx,svm}_inject_event().  All of this is buggy, but in particular, task
switches with the trace flag never end up signalling BT in %dr6.

To begin resolving this issue, add a new pending_dbg field to x86_event
(unioned with cr2 to avoid taking any extra space), and introduce
{pv,hvm}_inject_debug_exn() helpers to replace the current callers using
{pv,hvm}_inject_hw_exception().

A key property is that pending_dbg is taken with positive polarity to deal
with RTM sensibly.  Most callers pass in a constant, but callers passing in a
hardware %dr6 value need to xor the value with X86_DR6_DEFAULT to flip the
polarity of RTM and reserved fields.

For PV guests, move the ad-hoc updating of %dr6 into pv_inject_event().  This
in principle breaks the handing of RTM in do_debug(), but PV guests can't
actually enable MSR_DEBUGCTL.RTM yet, so this doesn't matter in practice.

Signed-off-by: Jinoh Kang 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Tim Deegan 
---
 xen/arch/x86/hvm/emulate.c |  3 ++-
 xen/arch/x86/hvm/hvm.c |  2 +-
 xen/arch/x86/hvm/svm/svm.c |  9 ++---
 xen/arch/x86/hvm/vmx/vmx.c | 13 -
 xen/arch/x86/include/asm/debugreg.h|  3 +++
 xen/arch/x86/include/asm/domain.h  | 12 
 xen/arch/x86/include/asm/hvm/hvm.h | 15 ++-
 xen/arch/x86/mm/shadow/multi.c |  5 +++--
 xen/arch/x86/pv/emul-priv-op.c | 11 +--
 xen/arch/x86/pv/emulate.c  |  6 ++
 xen/arch/x86/pv/ro-page-fault.c|  3 ++-
 xen/arch/x86/pv/traps.c| 16 
 xen/arch/x86/traps.c   |  6 +-
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 -
 14 files changed, 75 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9b6e4c8bc6..129403ad90 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2673,7 +2674,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
*hvmemul_ctxt,
 }
 
 if ( hvmemul_ctxt->ctxt.retire.singlestep )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BS);
 
 new_intr_shadow = hvmemul_ctxt->intr_shadow;
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 66ead0b878..c3d46841c2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3229,7 +3229,7 @@ void hvm_task_switch(
 }
 
 if ( (tss.trace & 1) && !exn_raised )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BT);
 
  out:
 hvm_unmap_entry(optss_desc);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 01dd592d9b..ac3123c56f 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -96,7 +96,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned 
int inst_len)
 curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
 
 if ( regs->eflags & X86_EFLAGS_TF )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BS);
 }
 
 static void cf_check svm_cpu_down(void)
@@ -2755,7 +2755,10 @@ void svm_vmexit_handler(void)
 goto unexpected_exit_type;
 if ( !rc )
 hvm_inject_exception(X86_EXC_DB,
- trap_type, insn_len, X86_EVENT_NO_EC);
+ trap_type, insn_len, X86_EVENT_NO_EC,
+ exit_reason == VMEXIT_ICEBP ? 0 :
+ /* #DB - Hardware already updated dr6. */
+ vmcb_get_dr6(vmcb) ^ X86_DR6_DEFAULT);
 }
 else
 domain_pause_for_debugger();
@@ -2785,7 +2788,7 @@ void svm_vmexit_handler(void)
if ( !rc )
hvm_inject_exception(X86_EXC_BP,
 X86_EVENTTYPE_SW_EXCEPTION,
-insn_len, X86_EVENT_NO_EC);
+insn_len, X86_EVENT_NO_EC, 0 /* N/A */);
 }
 break;
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7ec44018d4..716115332b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3068,7 +3068,7 @@ void update_guest_eip(void)
 }
 
 if ( regs->eflags & X86_EFLAGS_TF )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exn(X86_DR6_BS);
 }
 
 static void

[PATCH 4/5] x86: Fix merging of new status bits into %dr6

2023-08-21 Thread Jinoh Kang
From: Andrew Cooper 

The current logic used to update %dr6 when injecting #DB is buggy.  The
architectural behaviour is to overwrite B{0..3} (rather than accumulate) and
accumulate all other bits.

Introduce a new merge_dr6() helper, which also takes care of handing RTM
correctly.

Signed-off-by: Jinoh Kang 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 
---
 xen/arch/x86/hvm/svm/svm.c   |  3 ++-
 xen/arch/x86/hvm/vmx/vmx.c   |  3 ++-
 xen/arch/x86/include/asm/debugreg.h  | 30 +++-
 xen/arch/x86/include/asm/x86-defns.h | 10 --
 xen/arch/x86/pv/traps.c  |  3 ++-
 5 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index bd3adde0ec..7a5652f3df 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1338,7 +1338,8 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
  * Item 2 is done by hardware when injecting a #DB exception.
  */
 __restore_debug_registers(vmcb, curr);
-vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
+vmcb_set_dr6(vmcb, merge_dr6(vmcb_get_dr6(vmcb), _event.pending_dbg,
+ curr->domain->arch.cpuid->feat.rtm));
 
 /* fall through */
 case X86_EXC_BP:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 65166348f1..1fd902ed74 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2031,7 +2031,8 @@ static void cf_check vmx_inject_event(const struct 
x86_event *event)
  * All actions are left up to the hypervisor to perform.
  */
 __restore_debug_registers(curr);
-write_debugreg(6, read_debugreg(6) | event->pending_dbg);
+write_debugreg(6, merge_dr6(read_debugreg(6), event->pending_dbg,
+curr->domain->arch.cpuid->feat.rtm));
 
 if ( !nestedhvm_vcpu_in_guestmode(curr) ||
  !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) )
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index a1df74c02e..fd32846397 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -23,6 +23,14 @@
 #define X86_DR6_BT  (1u << 15)  /* Task switch */
 #define X86_DR6_RTM (1u << 16)  /* #DB/#BP in RTM region   */
 
+#define X86_DR6_BP_MASK \
+(X86_DR6_B0 | X86_DR6_B1 | X86_DR6_B2 | X86_DR6_B3)
+
+#define X86_DR6_KNOWN_MASK  \
+(X86_DR6_BP_MASK | X86_DR6_BD | X86_DR6_BS | X86_DR6_BT | X86_DR6_RTM)
+
+#define X86_DR6_DEFAULT 0x0ff0  /* Default %dr6 value. */
+
 #define DR_TRAP0(0x1)   /* db0 */
 #define DR_TRAP1(0x2)   /* db1 */
 #define DR_TRAP2(0x4)   /* db2 */
@@ -30,7 +38,6 @@
 #define DR_STEP (0x4000)/* single-step */
 #define DR_SWITCH   (0x8000)/* task switch */
 #define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ONE  0x0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
The top two bytes of the control register consist of 4 fields of 4
@@ -74,6 +81,8 @@
 #define DR_RTM_ENABLE(0x0800UL) /* RTM debugging enable */
 #define DR_GENERAL_DETECT(0x2000UL) /* General detect enable */
 
+#define X86_DR7_DEFAULT 0x0400  /* Default %dr7 value. */
+
 #define write_debugreg(reg, val) do {   \
 unsigned long __val = val;  \
 asm volatile ( "mov %0,%%db" #reg : : "r" (__val) );\
@@ -102,6 +111,25 @@ static inline unsigned long adjust_dr6_rsvd(unsigned long 
dr6, bool rtm)
 return dr6;
 }
 
+static inline unsigned long merge_dr6(unsigned long dr6, unsigned long new,
+  bool rtm)
+{
+/* Flip dr6 to have positive polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+/* Sanity check that only known values are passed in. */
+ASSERT(!(dr6 & ~X86_DR6_KNOWN_MASK));
+ASSERT(!(new & ~X86_DR6_KNOWN_MASK));
+
+/* Breakpoints 0-3 overridden.  BD, BS, BT and RTM accumulate. */
+dr6 = (dr6 & ~X86_DR6_BP_MASK) | new;
+
+/* Flip dr6 back to having default polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+return adjust_dr6_rsvd(dr6, rtm);
+}
+
 static inline unsigned long adjust_dr7_rsvd(unsigned long dr7, bool rtm)
 {
 /*
diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index e350227e57..7fbc74850a 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -100,16 +100,6 @@
 #define X

[PATCH 5/5] x86/dbg: Cleanup of legacy dr6 constants

2023-08-21 Thread Jinoh Kang
From: Andrew Cooper 

Replace the few remaining uses with X86_DR6_* constants.

Signed-off-by: Jinoh Kang 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/hvm/vmx/vmx.c  |  2 +-
 xen/arch/x86/include/asm/debugreg.h | 17 -
 xen/arch/x86/pv/emul-priv-op.c  |  2 +-
 xen/arch/x86/traps.c|  2 +-
 4 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1fd902ed74..43b57a444e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4306,7 +4306,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 __vmread(GUEST_PENDING_DBG_EXCEPTIONS, _dbg);
 __vmwrite(GUEST_PENDING_DBG_EXCEPTIONS,
-  pending_dbg | DR_STEP);
+  pending_dbg | X86_DR6_BS);
 }
 }
 
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index fd32846397..1061fdaaae 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -1,15 +1,6 @@
 #ifndef _X86_DEBUGREG_H
 #define _X86_DEBUGREG_H
 
-
-/* Indicate the register numbers for a number of the specific
-   debug registers.  Registers 0-3 contain the addresses we wish to trap on */
-
-#define DR_FIRSTADDR 0
-#define DR_LASTADDR  3
-#define DR_STATUS6
-#define DR_CONTROL   7
-
 /*
  * DR6 status bits.
  *   N.B. For backwards compatibility, X86_DR6_RTM has inverted polarity.
@@ -31,14 +22,6 @@
 
 #define X86_DR6_DEFAULT 0x0ff0  /* Default %dr6 value. */
 
-#define DR_TRAP0(0x1)   /* db0 */
-#define DR_TRAP1(0x2)   /* db1 */
-#define DR_TRAP2(0x4)   /* db2 */
-#define DR_TRAP3(0x8)   /* db3 */
-#define DR_STEP (0x4000)/* single-step */
-#define DR_SWITCH   (0x8000)/* task switch */
-#define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-
 /* Now define a bunch of things for manipulating the control register.
The top two bytes of the control register consist of 4 fields of 4
bits - each field corresponds to one of the four debug registers,
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 72d0514e74..78a1f4aff7 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1359,7 +1359,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
 {
 case X86EMUL_OKAY:
 if ( ctxt.ctxt.retire.singlestep )
-ctxt.bpmatch |= DR_STEP;
+ctxt.bpmatch |= X86_DR6_BS;
 
 if ( ctxt.bpmatch &&
  !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 640b60e0e5..829741ff78 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1955,7 +1955,7 @@ void do_debug(struct cpu_user_regs *regs)
  * If however we do, safety measures need to be enacted.  Use a big
  * hammer and clear all debug settings.
  */
-if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
+if ( dr6 & X86_DR6_BP_MASK )
 {
 unsigned int bp, dr7 = read_debugreg(7);
 
-- 
2.41.0




[PATCH 3/5] x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor actions to {svm,vmx}_inject_event()

2023-08-21 Thread Jinoh Kang
From: Andrew Cooper 

Currently, there is a lot of functionality in the #DB intercepts, and some
repeated functionality in the *_inject_event() logic.

The gdbsx code is implemented at both levels (albeit differently for #BP,
which is presumably due to the fact that the old emulator behaviour used to be
to move %rip forwards for traps), while the monitor behaviour only exists at
the intercept level.

Updating of %dr6 is implemented (buggily) at both levels, but having it at
both levels is problematic to implement correctly.

Rearrange the logic to have nothing interesting at the intercept level, and
everything implemented at the injection level.  Amongst other things, this
means that the monitor subsystem will pick up debug actions from emulated
events.

Signed-off-by: Jinoh Kang 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Razvan Cojocaru 
CC: Tamas K Lengyel 

This is RFC because it probably breaks introspection, as injection replies
from the introspection engine will (probably, but I haven't confirmed) trigger
new monitor events.

First of all, monitoring emulated debug events is a change in behaviour,
although IMO it is more of a bugfix than a new feature.  Also, similar changes
will happen to other monitored events as we try to unify the
intercept/emulation paths.

As for the recursive triggering of monitor events, I was considering extending
the monitor infrastructure to have a "in monitor reply" boolean which causes
hvm_monitor_debug() to ignore the recursive request.

Does this plan sound ok, or have I missed something more subtle with monitor
handling?
---
 xen/arch/x86/hvm/svm/svm.c | 126 -
 xen/arch/x86/hvm/vmx/vmx.c | 102 +-
 2 files changed, 110 insertions(+), 118 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index ac3123c56f..bd3adde0ec 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1328,19 +1328,50 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case X86_EXC_DB:
-if ( regs->eflags & X86_EFLAGS_TF )
-{
-__restore_debug_registers(vmcb, curr);
-vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | DR_STEP);
-}
+/*
+ * On AMD hardware, a #DB exception:
+ *  1) Merges new status bits into %dr6
+ *  2) Clears %dr7.gd and MSR_DEBUGCTL.{LBR,BTF}
+ *
+ * Item 1 is done by hardware before a #DB intercepted vmexit, but we
+ * may end up here from emulation so have to repeat it ourselves.
+ * Item 2 is done by hardware when injecting a #DB exception.
+ */
+__restore_debug_registers(vmcb, curr);
+vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
+
 /* fall through */
 case X86_EXC_BP:
 if ( curr->domain->debugger_attached )
 {
 /* Debug/Int3: Trap to debugger. */
+if ( _event.vector == X86_EXC_BP )
+{
+/* N.B. Can't use __update_guest_eip() for risk of recusion. */
+regs->rip += _event.insn_len;
+regs->eflags &= ~X86_EFLAGS_RF;
+curr->arch.gdbsx_vcpu_event = X86_EXC_BP;
+}
+
 domain_pause_for_debugger();
 return;
 }
+else
+{
+int rc = hvm_monitor_debug(regs->rip,
+   _event.vector == X86_EXC_DB
+   ? HVM_MONITOR_DEBUG_EXCEPTION
+   : HVM_MONITOR_SOFTWARE_BREAKPOINT,
+   _event.type, _event.insn_len,
+   _event.pending_dbg);
+if ( rc < 0 )
+{
+gprintk(XENLOG_ERR, "Monitor debug error %d\n", rc);
+return svm_crash_or_fault(curr);
+}
+if ( rc )
+return; /* VCPU paused.  Wait for monitor. */
+}
 break;
 
 case X86_EXC_PF:
@@ -2730,67 +2761,46 @@ void svm_vmexit_handler(void)
 
 case VMEXIT_ICEBP:
 case VMEXIT_EXCEPTION_DB:
-if ( !v->domain->debugger_attached )
+case VMEXIT_EXCEPTION_BP:
+{
+unsigned int vec, type, len, extra;
+
+switch ( exit_reason )
 {
-unsigned int trap_type;
+case VMEXIT_ICEBP:
+vec   = X86_EXC_DB;
+type  = X86_EVENTTYPE_PRI_SW_EXCEPTION;
+len   = svm_get_insn_len(v, INSTR_ICEBP);
+extra = 0;
+break;
 
-if ( likely(exit_reason != VMEXIT_ICEBP) )
-{
-trap_type = X86_EVENTTYPE_HW_EXCEPTION;
-insn_len = 0;
-}
-

[PATCH 1/5] x86: Fix calculation of %dr6/7 reserved bits

2023-08-21 Thread Jinoh Kang
From: Andrew Cooper 

The reserved bit calculations for %dr6 and %dr7 depend on whether the VM has
the Restricted Transnational Memory feature available.

Introduce adjust_dr{6,7}_rsvd() and replace the opencoded logic and constants
(except for DR_STATUS_RESERVED_ONE which is (mis)used elsewhere and will be
removed after future bugfixes).  The use of these helpers in set_debugreg()
covers toolstack values for PV guests, but HVM guests need similar treatment.

The use of the guests cpuid policy is less than optimal in the create/restore
paths.  However in such cases, the policy will be the guest maximum policy,
which will be more permissive with respect to the RTM feature.

Signed-off-by: Jinoh Kang 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/domain.c   |  7 +++--
 xen/arch/x86/hvm/hvm.c  |  6 ++--
 xen/arch/x86/include/asm/debugreg.h | 44 +
 xen/arch/x86/pv/misc-hypercalls.c   | 16 +++
 4 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe86a7f853..a39710b5af 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1053,6 +1053,7 @@ int arch_set_info_guest(
 struct vcpu *v, vcpu_guest_context_u c)
 {
 struct domain *d = v->domain;
+const struct cpu_policy *cp = d->arch.cpuid;
 unsigned int i;
 unsigned long flags;
 bool compat;
@@ -1165,10 +1166,10 @@ int arch_set_info_guest(
 
 if ( is_hvm_domain(d) )
 {
-for ( i = 0; i < ARRAY_SIZE(v->arch.dr); ++i )
+for ( i = 0; i < ARRAY_SIZE(v->arch.dr) - 2; ++i )
 v->arch.dr[i] = c(debugreg[i]);
-v->arch.dr6 = c(debugreg[6]);
-v->arch.dr7 = c(debugreg[7]);
+v->arch.dr6 = adjust_dr6_rsvd(c(debugreg[6]), cp->feat.rtm);
+v->arch.dr7 = adjust_dr7_rsvd(c(debugreg[7]), cp->feat.rtm);
 
 if ( v->vcpu_id == 0 )
 d->vm_assist = c.nat->vm_assist;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a99c0ff20..66ead0b878 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -985,6 +986,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain 
*d)
 
 static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t 
*h)
 {
+const struct cpu_policy *cp = d->arch.cpuid;
 unsigned int vcpuid = hvm_load_instance(h);
 struct vcpu *v;
 struct hvm_hw_cpu ctxt;
@@ -1166,8 +1168,8 @@ static int cf_check hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 v->arch.dr[1] = ctxt.dr1;
 v->arch.dr[2] = ctxt.dr2;
 v->arch.dr[3] = ctxt.dr3;
-v->arch.dr6   = ctxt.dr6;
-v->arch.dr7   = ctxt.dr7;
+v->arch.dr6   = adjust_dr6_rsvd(ctxt.dr6, cp->feat.rtm);
+v->arch.dr7   = adjust_dr7_rsvd(ctxt.dr7, cp->feat.rtm);
 
 hvmemul_cancel(v);
 
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index 86aa6d7143..8be60910b4 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -10,9 +10,18 @@
 #define DR_STATUS6
 #define DR_CONTROL   7
 
-/* Define a few things for the status register.  We can use this to determine
-   which debugging register was responsible for the trap.  The other bits
-   are either reserved or not of interest to us. */
+/*
+ * DR6 status bits.
+ *   N.B. For backwards compatibility, X86_DR6_RTM has inverted polarity.
+ */
+#define X86_DR6_B0  (1u <<  0)  /* Breakpoint 0 triggered  */
+#define X86_DR6_B1  (1u <<  1)  /* Breakpoint 1 triggered  */
+#define X86_DR6_B2  (1u <<  2)  /* Breakpoint 2 triggered  */
+#define X86_DR6_B3  (1u <<  3)  /* Breakpoint 3 triggered  */
+#define X86_DR6_BD  (1u << 13)  /* Debug register accessed */
+#define X86_DR6_BS  (1u << 14)  /* Single step */
+#define X86_DR6_BT  (1u << 15)  /* Task switch */
+#define X86_DR6_RTM (1u << 16)  /* #DB/#BP in RTM region   */
 
 #define DR_TRAP0(0x1)   /* db0 */
 #define DR_TRAP1(0x2)   /* db1 */
@@ -21,7 +30,6 @@
 #define DR_STEP (0x4000)/* single-step */
 #define DR_SWITCH   (0x8000)/* task switch */
 #define DR_NOT_RTM  (0x1)   /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ZERO (~0xefffUL) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0x0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
@@ -61,8 +69,6 @@
We can slow the instruction pipeline for instructions coming via the
gdt or the ldt if we want to.  I am not sure why this is an advantage */
 

[PATCH 0/5] Fixes to debugging facilities

2023-08-21 Thread Jinoh Kang
This is a rebased version of Andrew Cooper's debugging facilities patch:
https://lore.kernel.org/xen-devel/1528120755-17455-1-git-send-email-andrew.coop...@citrix.com/

> So this started as a small fix for the vmentry failure (penultimate patch),
> and has snowballed...
>
> I'm fairly confident that everything involving DEBUGCTL.BTF is broken, and
> there are definitely bugs with configuring DEBUGCTL.RTM (which really isn't
> helped by the fact that the GCC TSX intrinsics render the resulting code
> un-debuggable.)  I'll defer fixing these swamps for now.
>
> The first 4 patches probably want backporting to the stable trees, so I've
> taken care to move them ahead of patch 6 for backport reasons.  While all
> fixes would ideally be backported, I can't find a way of fixing %dr6 merging
> (as it needs to be done precicely once) without a behavioural change in the
> monitor subsystem.
>
> Patch 8 probably breaks introspection, so can't be taken at this point.  See
> that patch for discussion of the problem and my best guess at a solution.

6 out of 11 patches from the 2018 patch series above, including the
vmentry failure fix, have already been committed.  This covers the
remaining 5 patches.

One particular bug that the patch series fixes involves simultaneous
hardware breakpoint exception and single-stepping exception occurring at
the same PC (IP).  Xen blindly sets singlestep (DR6.BS := 1) in this
case, which interferes with userland debugging and allows (otherwise
restricted) usermode programs to detect Xen HVM (or PVH).  The following
Linux x86-64 program demonstrates the bug:

---8<---

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define ABORT_ON_ERR(x) if ((x) == -1) abort();

int main(void)
{
unsigned long cur_rip, cur_eflags, cur_dr6;
int wstatus, exit_code;
pid_t pid;

ABORT_ON_ERR(pid = fork());
if (pid == 0) {
ABORT_ON_ERR(ptrace(PTRACE_TRACEME, 0, NULL, NULL));
ABORT_ON_ERR(raise(SIGSTOP));
_exit(0);
}

/* Wait for first ptrace event */
if (waitpid(pid, , 0) != pid) abort();
if (!WIFSTOPPED(wstatus)) abort();

/* Obtain current RIP value and perform sanity check */
cur_rip = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.rip), _rip);
cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
u_debugreg[6]), _dr6);
assert(cur_dr6 == 0x0ff0UL);

/* Set up debug registers and set EFLAGS.TF */
cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.eflags), _eflags);
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
regs.eflags), (void *)(cur_eflags | 0x100UL)));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[0]), (void *)cur_rip));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[7]), (void *)1L));

/* Continue execution to trigger hardware breakpoint */
ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
if (waitpid(pid, , 0) != pid) abort();
if (!(WIFSTOPPED(wstatus) && WSTOPSIG(wstatus) == SIGTRAP)) abort();

/* Detect if Xen has tampered with DR6 */
cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
u_debugreg[6]), _dr6);
fprintf(stderr, "DR6 = 0x%08lx\n", cur_dr6);
if (cur_dr6 == 0x0ff1UL)
{
fputs("Running on bare-metal, Xen PV, or non-Xen VMM\n", stdout);
exit_code = EXIT_FAILURE;
}
else
{
fputs("Running on Xen HVM\n", stdout);
exit_code = EXIT_SUCCESS;
}

/* Tear down debug registers and unset EFLAGS.TF */
cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.eflags), _eflags);
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
regs.eflags), (void *)(cur_eflags & ~0x100UL)));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[7]), (void *)0L));

/* Continue execution to let child process exit */
ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
if (waitpid(pid, , 0) != pid) abort();
if (!(WIFEXITED(wstatus) && WEXITSTATUS(wstatus) == 0)) abort();

return exit_code;
}

---8<---

Andrew Cooper (5):
  x86: Fix calculation of %dr6/7 reserved bits
  x86/emul: Add pending_dbg field to x86_event
  x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor actions
to {svm,vmx}_inject_event()
  x86: Fix merging of new status bits into %dr6
  x86/dbg: Cleanup of legacy dr6 constants

 xen/arch/x86/domain.c  |   7 +-
 xen/arch/x86/hvm/emulate.c |   3 +-
 xen/arch/x86/hvm/hvm.c |   8 +-
 xen/arch/x86/hvm/svm/svm.c | 126 ++---
 xen/arch/x86/hvm/vmx/vmx.c | 

Re: [PATCH 0/6] x86/debug: fix guest dr6 value for single stepping and HW breakpoints

2023-08-19 Thread Jinoh Kang
On 8/19/23 05:22, Andrew Cooper wrote:
> I suspect what we'll need to do is combine parts of your series and
> parts of mine.  I never got around to fixing the introspection bugs in
> mine (that's a far larger task to do nicely), and yours is more targetted.

Unless I've done something wrong, your debug-fixes-v1 when rebased to master
seems to fix my problem, so I guess your patchset is the superset.

-- 
Sincerely,
Jinoh Kang




Re: [PATCH 3/6] x86: don't assume #DB is always caused by singlestep if EFLAGS.TF is set

2023-08-18 Thread Jinoh Kang
On 8/19/23 00:47, Jinoh Kang wrote:
> Today, when a HVM (or PVH) guest triggers a hardware breakpoint while
> EFLAGS.TF is set, Xen incorrectly assumes that this is a single stepping
> exception and sets DR_STEP in dr6 in addition to DR_TRAP.
> 
> This causes problems with Linux HW breakpoint handler, which ignores
> DR_TRAP bits when DR_STEP is set.  This prevents user-mode debuggers
> from recognizing hardware breakpoints if EFLAGS.TF is set.
> 
> Fix this by not setting DR_STEP in {vmx,svm}_inject_event, unless the
> emulator explicitly signals the single-stepping mode via the newly added
> "singlestep" boolean field of struct x86_event.

s/the newly added "singlestep" boolean field/the 'extra' field/.  oops.

> 
> Fixes: 8b831f4189 ("x86: single step after instruction emulation")
> Signed-off-by: Jinoh Kang 
> ---
>  xen/arch/x86/hvm/emulate.c |  3 ++-
>  xen/arch/x86/hvm/svm/svm.c |  6 +++---
>  xen/arch/x86/hvm/vmx/vmx.c |  6 +++---
>  xen/arch/x86/include/asm/hvm/hvm.h | 12 
>  xen/arch/x86/mm/shadow/multi.c |  5 +++--
>  xen/arch/x86/x86_emulate/x86_emulate.h |  4 +++-
>  6 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 9b6e4c8bc61b..5ad372466e1d 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct hvmemul_cache
>  {
> @@ -2673,7 +2674,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
> *hvmemul_ctxt,
>  }
>  
>  if ( hvmemul_ctxt->ctxt.retire.singlestep )
> -hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +hvm_inject_debug_exception(DR_STEP);
>  
>  new_intr_shadow = hvmemul_ctxt->intr_shadow;
>  
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index d5e8cb0722ca..f25d6a53f092 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -96,7 +96,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, 
> unsigned int inst_len)
>  curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
>  
>  if ( regs->eflags & X86_EFLAGS_TF )
> -hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +hvm_inject_debug_exception(DR_STEP);
>  }
>  
>  static void cf_check svm_cpu_down(void)
> @@ -1328,10 +1328,10 @@ static void cf_check svm_inject_event(const struct 
> x86_event *event)
>  switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
>  {
>  case X86_EXC_DB:
> -if ( regs->eflags & X86_EFLAGS_TF )
> +if ( event->extra )
>  {
>  __restore_debug_registers(vmcb, curr);
> -vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | DR_STEP);
> +vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | event->extra);
>  }
>  /* fall through */
>  case X86_EXC_BP:
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 8823ca13e55d..1795b9479cf9 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2022,10 +2022,10 @@ static void cf_check vmx_inject_event(const struct 
> x86_event *event)
>  switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
>  {
>  case X86_EXC_DB:
> -if ( guest_cpu_user_regs()->eflags & X86_EFLAGS_TF )
> +if ( event->extra )
>  {
>  __restore_debug_registers(curr);
> -write_debugreg(6, read_debugreg(6) | DR_STEP);
> +write_debugreg(6, read_debugreg(6) | event->extra);
>  }
>  if ( !nestedhvm_vcpu_in_guestmode(curr) ||
>   !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) 
> )
> @@ -3068,7 +3068,7 @@ void update_guest_eip(void)
>  }
>  
>  if ( regs->eflags & X86_EFLAGS_TF )
> -hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +hvm_inject_debug_exception(DR_STEP);
>  }
>  
>  static void cf_check vmx_fpu_dirty_intercept(void)
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
> b/xen/arch/x86/include/asm/hvm/hvm.h
> index f3f6310ab684..6a0b9e3ff01e 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -538,6 +538,18 @@ static inline void hvm_inject_page_fault(int errcode, 
> unsigned long cr2)
>  hvm_inject_event();
>  }
>  
> +static inline void hvm_inject_debug_exception(unsigned long pending_dbg)
> +{
> +struct x86_event event = {
> +.vector = X86_EXC_DB,
> +.t

[PATCH] x86/svm: invert valid condition in svm_get_pending_event()

2023-08-18 Thread Jinoh Kang
Fixes: 9864841914c2 ("x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT")
Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/hvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 01dd592d9b83..beb076ea8d62 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2415,7 +2415,7 @@ static bool cf_check svm_get_pending_event(
 {
 const struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
 
-if ( vmcb->event_inj.v )
+if ( !vmcb->event_inj.v )
 return false;
 
 info->vector = vmcb->event_inj.vector;
-- 
2.41.0




[PATCH 6/6] x86/debug: actually plumb pending_dbg through the monitor and devicemodel interfaces

2023-08-18 Thread Jinoh Kang
Commit 21867648033d ("x86/debug: Plumb pending_dbg through the monitor
and devicemodel interfaces") introduced pending_dbg, but did not
actually populate or use the field.

Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/hvm/svm/svm.c | 34 +++---
 xen/arch/x86/hvm/vmx/vmx.c | 32 
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f25d6a53f092..139be9902dae 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2422,6 +2422,14 @@ static bool cf_check svm_get_pending_event(
 info->type = vmcb->event_inj.type;
 info->error_code = vmcb->event_inj.ec;
 
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_DB )
+{
+unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+vmcb_get_dr6(vmcb) : v->arch.dr6;
+info->extra = dr6 & ~DR_STATUS_RESERVED_ONE;
+}
+
 return true;
 }
 
@@ -2733,16 +2741,28 @@ void svm_vmexit_handler(void)
 if ( !v->domain->debugger_attached )
 {
 unsigned int trap_type;
+unsigned long exit_pending_dbg;
 
 if ( likely(exit_reason != VMEXIT_ICEBP) )
 {
 trap_type = X86_EVENTTYPE_HW_EXCEPTION;
 insn_len = 0;
+
+__restore_debug_registers(vmcb, v);
+
+/*
+ * NOTE: This is slightly wrong; old bits in dr6 are not
+ * automatically cleared by CPU on #DB, so it's not exactly
+ * equivalent to PENDING_DBG_EXCEPTIONS in semantics.
+ */
+exit_pending_dbg = vmcb_get_dr6(vmcb) & 
~DR_STATUS_RESERVED_ONE;
+vmcb_set_dr6(vmcb, DR_STATUS_RESERVED_ONE);
 }
 else
 {
 trap_type = X86_EVENTTYPE_PRI_SW_EXCEPTION;
 insn_len = svm_get_insn_len(v, INSTR_ICEBP);
+exit_pending_dbg = 0;
 
 if ( !insn_len )
 break;
@@ -2750,12 +2770,20 @@ void svm_vmexit_handler(void)
 
 rc = hvm_monitor_debug(regs->rip,
HVM_MONITOR_DEBUG_EXCEPTION,
-   trap_type, insn_len, 0);
+   trap_type, insn_len, exit_pending_dbg);
 if ( rc < 0 )
 goto unexpected_exit_type;
 if ( !rc )
-hvm_inject_exception(X86_EXC_DB,
- trap_type, insn_len, X86_EVENT_NO_EC);
+{
+if (trap_type == X86_EVENTTYPE_HW_EXCEPTION)
+{
+/* Updates DR6 where debugger can peek. */
+hvm_inject_debug_exception(exit_pending_dbg);
+}
+else
+hvm_inject_exception(X86_EXC_DB,
+ trap_type, insn_len, X86_EVENT_NO_EC);
+}
 }
 else
 domain_pause_for_debugger();
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1795b9479cf9..63411b62cb94 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2469,6 +2469,14 @@ static bool cf_check vmx_get_pending_event(
 info->type = MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK);
 info->error_code = error_code;
 
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_DB )
+{
+unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+read_debugreg(6) : v->arch.dr6;
+info->extra = dr6 & ~DR_STATUS_RESERVED_ONE;
+}
+
 return true;
 }
 
@@ -4240,13 +4248,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 {
 case X86_EXC_DB:
 /*
- * Updates DR6 where debugger can peek (See 3B 23.2.1,
- * Table 23-1, "Exit Qualification for Debug Exceptions").
+ * See 3B 23.2.1, Table 23-1, "Exit Qualification for Debug
+ * Exceptions".
  */
 __vmread(EXIT_QUALIFICATION, _qualification);
 HVMTRACE_1D(TRAP_DEBUG, exit_qualification);
-__restore_debug_registers(v);
-write_debugreg(6, exit_qualification | DR_STATUS_RESERVED_ONE);
 
 /*
  * Work around SingleStep + STI/MovSS VMEntry failures.
@@ -4285,22 +4291,32 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 if ( !v->domain->debugger_attached )
 {
-unsigned long insn_len = 0;
+unsigned long exit_pending_dbg = 0, insn_len = 0;
 int rc;
 unsigned long trap_type = MASK_EXTR(intr_info,
   

[PATCH 5/6] x86/pv: factor out single-step debug trap injection

2023-08-18 Thread Jinoh Kang
Add pv_inject_debug_exception() helper and use it wherever
applicable.

This helper corresponds to hvm_inject_debug_exception() in HVM.

Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/include/asm/domain.h | 12 
 xen/arch/x86/pv/emulate.c |  5 +
 xen/arch/x86/pv/ro-page-fault.c   |  5 +
 xen/arch/x86/pv/traps.c   | 10 ++
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index 0e445cff5c08..cfeb63da6cd6 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -741,6 +741,18 @@ static inline void pv_inject_page_fault(int errcode, 
unsigned long cr2)
 pv_inject_event();
 }
 
+static inline void pv_inject_debug_exception(unsigned long pending_dbg)
+{
+const struct x86_event event = {
+.vector = X86_EXC_DB,
+.type = X86_EVENTTYPE_HW_EXCEPTION,
+.error_code = X86_EVENT_NO_EC,
+.extra = pending_dbg,
+};
+
+pv_inject_event();
+}
+
 static inline void pv_inject_sw_interrupt(unsigned int vector)
 {
 const struct x86_event event = {
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index e7a1c0a2cc4f..865b05337192 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -72,10 +72,7 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, 
unsigned long rip)
 regs->rip = rip;
 regs->eflags &= ~X86_EFLAGS_RF;
 if ( regs->eflags & X86_EFLAGS_TF )
-{
-current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
-}
+pv_inject_debug_exception(DR_STEP);
 }
 
 uint64_t pv_get_reg(struct vcpu *v, unsigned int reg)
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index 238bfbeb4ac4..9c6042cab3b2 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -391,10 +391,7 @@ int pv_ro_page_fault(unsigned long addr, struct 
cpu_user_regs *regs)
 /* Fallthrough */
 case X86EMUL_OKAY:
 if ( ctxt.retire.singlestep )
-{
-current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
-}
+pv_inject_debug_exception(DR_STEP);
 
 /* Fallthrough */
 case X86EMUL_RETRY:
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index e5c9734b8204..4cf31558ac2f 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 void pv_inject_event(const struct x86_event *event)
@@ -64,7 +65,16 @@ void pv_inject_event(const struct x86_event *event)
 trace_pv_page_fault(event->extra, error_code);
 }
 else
+{
+if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ vector == X86_EXC_DB )
+{
+if ( event->extra )
+curr->arch.dr6 |= event->extra | DR_STATUS_RESERVED_ONE;
+}
+
 trace_pv_trap(vector, regs->rip, use_error_code, error_code);
+}
 
 if ( use_error_code )
 {
-- 
2.41.0




[PATCH 3/6] x86: don't assume #DB is always caused by singlestep if EFLAGS.TF is set

2023-08-18 Thread Jinoh Kang
Today, when a HVM (or PVH) guest triggers a hardware breakpoint while
EFLAGS.TF is set, Xen incorrectly assumes that this is a single stepping
exception and sets DR_STEP in dr6 in addition to DR_TRAP.

This causes problems with Linux HW breakpoint handler, which ignores
DR_TRAP bits when DR_STEP is set.  This prevents user-mode debuggers
from recognizing hardware breakpoints if EFLAGS.TF is set.

Fix this by not setting DR_STEP in {vmx,svm}_inject_event, unless the
emulator explicitly signals the single-stepping mode via the newly added
"singlestep" boolean field of struct x86_event.

Fixes: 8b831f4189 ("x86: single step after instruction emulation")
Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/hvm/emulate.c |  3 ++-
 xen/arch/x86/hvm/svm/svm.c |  6 +++---
 xen/arch/x86/hvm/vmx/vmx.c |  6 +++---
 xen/arch/x86/include/asm/hvm/hvm.h | 12 
 xen/arch/x86/mm/shadow/multi.c |  5 +++--
 xen/arch/x86/x86_emulate/x86_emulate.h |  4 +++-
 6 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9b6e4c8bc61b..5ad372466e1d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct hvmemul_cache
 {
@@ -2673,7 +2674,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
*hvmemul_ctxt,
 }
 
 if ( hvmemul_ctxt->ctxt.retire.singlestep )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exception(DR_STEP);
 
 new_intr_shadow = hvmemul_ctxt->intr_shadow;
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index d5e8cb0722ca..f25d6a53f092 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -96,7 +96,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned 
int inst_len)
 curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
 
 if ( regs->eflags & X86_EFLAGS_TF )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exception(DR_STEP);
 }
 
 static void cf_check svm_cpu_down(void)
@@ -1328,10 +1328,10 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case X86_EXC_DB:
-if ( regs->eflags & X86_EFLAGS_TF )
+if ( event->extra )
 {
 __restore_debug_registers(vmcb, curr);
-vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | DR_STEP);
+vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | event->extra);
 }
 /* fall through */
 case X86_EXC_BP:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8823ca13e55d..1795b9479cf9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2022,10 +2022,10 @@ static void cf_check vmx_inject_event(const struct 
x86_event *event)
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case X86_EXC_DB:
-if ( guest_cpu_user_regs()->eflags & X86_EFLAGS_TF )
+if ( event->extra )
 {
 __restore_debug_registers(curr);
-write_debugreg(6, read_debugreg(6) | DR_STEP);
+write_debugreg(6, read_debugreg(6) | event->extra);
 }
 if ( !nestedhvm_vcpu_in_guestmode(curr) ||
  !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) )
@@ -3068,7 +3068,7 @@ void update_guest_eip(void)
 }
 
 if ( regs->eflags & X86_EFLAGS_TF )
-hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+hvm_inject_debug_exception(DR_STEP);
 }
 
 static void cf_check vmx_fpu_dirty_intercept(void)
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
b/xen/arch/x86/include/asm/hvm/hvm.h
index f3f6310ab684..6a0b9e3ff01e 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -538,6 +538,18 @@ static inline void hvm_inject_page_fault(int errcode, 
unsigned long cr2)
 hvm_inject_event();
 }
 
+static inline void hvm_inject_debug_exception(unsigned long pending_dbg)
+{
+struct x86_event event = {
+.vector = X86_EXC_DB,
+.type = X86_EVENTTYPE_HW_EXCEPTION,
+.error_code = X86_EVENT_NO_EC,
+.extra = pending_dbg,
+};
+
+hvm_inject_event();
+}
+
 static inline bool hvm_event_pending(const struct vcpu *v)
 {
 return alternative_call(hvm_funcs.event_pending, v);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index cf74fdf5dda6..365af5169750 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "private.h"
 #include "types.h"
@@ -2788,7 +2789,7 @@ static int cf_check sh_page_fault(
 #endif
 
 if ( emu

[PATCH 4/6] x86/pv: set DR_STEP if single-stepping after ro page fault emulation

2023-08-18 Thread Jinoh Kang
Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/pv/ro-page-fault.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index cad28ef928ad..238bfbeb4ac4 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 #include "emulate.h"
 #include "mm.h"
@@ -390,7 +391,10 @@ int pv_ro_page_fault(unsigned long addr, struct 
cpu_user_regs *regs)
 /* Fallthrough */
 case X86EMUL_OKAY:
 if ( ctxt.retire.singlestep )
+{
+current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
 pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+}
 
 /* Fallthrough */
 case X86EMUL_RETRY:
-- 
2.41.0




[PATCH 2/6] x86emul: rename field 'cr2' of struct x86_event to 'extra'

2023-08-18 Thread Jinoh Kang
XEN_DMOP_inject_event() copies the 'cr2' argument to struct x86_event.
'cr2' is overladed to mean pending_dbg for a debug trap, but consumers
of struct x86_event always interpret it as CR2.

Clarify the role of the 'cr2' field by renaming it to 'extra', in
preparation for an upcoming patch that uses it to actually populate dr6.

Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/hvm/dm.c  | 2 +-
 xen/arch/x86/hvm/hvm.c | 4 ++--
 xen/arch/x86/hvm/svm/nestedsvm.c   | 2 +-
 xen/arch/x86/hvm/svm/svm.c | 8 
 xen/arch/x86/hvm/vmx/vmx.c | 2 +-
 xen/arch/x86/include/asm/domain.h  | 2 +-
 xen/arch/x86/include/asm/hvm/hvm.h | 2 +-
 xen/arch/x86/pv/traps.c| 8 
 xen/arch/x86/x86_emulate/x86_emulate.h | 4 ++--
 9 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 462691f91d3c..48a0c09f7af3 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -314,7 +314,7 @@ static int inject_event(struct domain *d,
 v->arch.hvm.inject_event.type = data->type;
 v->arch.hvm.inject_event.insn_len = data->insn_len;
 v->arch.hvm.inject_event.error_code = data->error_code;
-v->arch.hvm.inject_event.cr2 = data->cr2;
+v->arch.hvm.inject_event.extra = data->cr2;
 smp_wmb();
 v->arch.hvm.inject_event.vector = data->vector;
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 48a77524f198..1abdec35257b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -507,7 +507,7 @@ static bool hvm_get_pending_event(struct vcpu *v, struct 
x86_event *info)
 
 if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
  info->vector == X86_EXC_PF )
-info->cr2 = v->arch.hvm.guest_cr[2];
+info->extra = v->arch.hvm.guest_cr[2];
 
 return true;
 }
@@ -548,7 +548,7 @@ void hvm_do_resume(struct vcpu *v)
 if ( hvm_get_pending_event(v, ) )
 {
 hvm_monitor_interrupt(info.vector, info.type, info.error_code,
-  info.cr2);
+  info.extra);
 v->arch.monitor.next_interrupt_enabled = false;
 }
 }
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index a09b6abaaeaf..9bd2a304ac01 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -842,7 +842,7 @@ int cf_check nsvm_vcpu_vmexit_event(
 ASSERT(vcpu_nestedhvm(v).nv_vvmcx != NULL);
 
 nestedsvm_vmexit_defer(v, VMEXIT_EXCEPTION_DE + event->vector,
-   event->error_code, event->cr2);
+   event->error_code, event->extra);
 return NESTEDHVM_VMEXIT_DONE;
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 01dd592d9b83..d5e8cb0722ca 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1252,7 +1252,7 @@ static void svm_emul_swint_injection(struct x86_event 
*event)
 {
 fault = X86_EXC_PF;
 ec = pfinfo.ec;
-event->cr2 = pfinfo.linear;
+event->extra = pfinfo.linear;
 }
 
 goto raise_exception;
@@ -1345,8 +1345,8 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
 
 case X86_EXC_PF:
 ASSERT(_event.type == X86_EVENTTYPE_HW_EXCEPTION);
-curr->arch.hvm.guest_cr[2] = _event.cr2;
-vmcb_set_cr2(vmcb, _event.cr2);
+curr->arch.hvm.guest_cr[2] = _event.extra;
+vmcb_set_cr2(vmcb, _event.extra);
 break;
 }
 
@@ -1430,7 +1430,7 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
 if ( _event.vector == X86_EXC_PF &&
  _event.type == X86_EVENTTYPE_HW_EXCEPTION )
 HVMTRACE_LONG_2D(PF_INJECT, _event.error_code,
- TRC_PAR_LONG(_event.cr2));
+ TRC_PAR_LONG(_event.extra));
 else
 HVMTRACE_2D(INJ_EXC, _event.vector, _event.error_code);
 }
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7ec44018d4ed..8823ca13e55d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2051,7 +2051,7 @@ static void cf_check vmx_inject_event(const struct 
x86_event *event)
 
 case X86_EXC_PF:
 ASSERT(_event.type == X86_EVENTTYPE_HW_EXCEPTION);
-curr->arch.hvm.guest_cr[2] = _event.cr2;
+curr->arch.hvm.guest_cr[2] = _event.extra;
 break;
 }
 
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index c2d9fc333be5..0e445cff5c08 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -735,7 +735,7 @@ static inline void pv_inject_page_fault(int errcode, 
unsigned long cr2)
 .vector = X86_EXC_PF,
 .typ

[PATCH 1/6] x86/hvm: only populate info->cr2 for #PF in hvm_get_pending_event()

2023-08-18 Thread Jinoh Kang
Prepare for an upcoming patch that overloads the 'cr2' field for #DB.

Signed-off-by: Jinoh Kang 
---
 xen/arch/x86/hvm/hvm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a99c0ff20be..48a77524f198 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -502,9 +502,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
 {
-info->cr2 = v->arch.hvm.guest_cr[2];
+if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
+return false;
+
+if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+ info->vector == X86_EXC_PF )
+info->cr2 = v->arch.hvm.guest_cr[2];
 
-return alternative_call(hvm_funcs.get_pending_event, v, info);
+return true;
 }
 
 void hvm_do_resume(struct vcpu *v)
-- 
2.41.0




[PATCH 0/6] x86/debug: fix guest dr6 value for single stepping and HW breakpoints

2023-08-18 Thread Jinoh Kang
Xen has a bug where hardware breakpoint exceptions (DR_TRAP) are
erroneously recognized as single-stepping exceptions (DR_STEP).  This
interferes with userland debugging and allows (otherwise restricted)
usermode programs to detect Xen HVM (or PVH).  This patch series aim to
fix this.

The last patch is optional and finishes incomplete plumbing work
initiated by commit 21867648033d (x86/debug: Plumb pending_dbg through
the monitor and devicemodel interfaces, 2018-05-31).

The following Linux x86-64 program demonstrates the bug:

--- (snip) ---

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define ABORT_ON_ERR(x) if ((x) == -1) abort();

int main(void)
{
unsigned long cur_rip, cur_eflags, cur_dr6;
int wstatus, exit_code;
pid_t pid;

ABORT_ON_ERR(pid = fork());
if (pid == 0) {
ABORT_ON_ERR(ptrace(PTRACE_TRACEME, 0, NULL, NULL));
ABORT_ON_ERR(raise(SIGSTOP));
_exit(0);
}

/* Wait for first ptrace event */
if (waitpid(pid, , 0) != pid) abort();
if (!WIFSTOPPED(wstatus)) abort();

/* Obtain current RIP value and perform sanity check */
cur_rip = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.rip), _rip);
cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
u_debugreg[6]), _dr6);
assert(cur_dr6 == 0x0ff0UL);

/* Set up debug registers and set EFLAGS.TF */
cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.eflags), _eflags);
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
regs.eflags), (void *)(cur_eflags | 0x100UL)));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[0]), (void *)cur_rip));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[7]), (void *)1L));

/* Continue execution to trigger hardware breakpoint */
ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
if (waitpid(pid, , 0) != pid) abort();
if (!(WIFSTOPPED(wstatus) && WSTOPSIG(wstatus) == SIGTRAP)) abort();

/* Detect if Xen has tampered with DR6 */
cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
u_debugreg[6]), _dr6);
fprintf(stderr, "DR6 = 0x%08lx\n", cur_dr6);
if (cur_dr6 == 0x0ff1UL)
{
fputs("Running on bare-metal, Xen PV, or non-Xen VMM\n", stdout);
exit_code = EXIT_FAILURE;
}
else
{
fputs("Running on Xen HVM\n", stdout);
exit_code = EXIT_SUCCESS;
}

/* Tear down debug registers and unset EFLAGS.TF */
cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, 
regs.eflags), _eflags);
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
regs.eflags), (void *)(cur_eflags & ~0x100UL)));
ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, 
u_debugreg[7]), (void *)0L));

/* Continue execution to let child process exit */
ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
if (waitpid(pid, , 0) != pid) abort();
if (!(WIFEXITED(wstatus) && WEXITSTATUS(wstatus) == 0)) abort();

    return exit_code;
}

--- (snip) ---

Jinoh Kang (6):
  x86/hvm: only populate info->cr2 for #PF in hvm_get_pending_event()
  x86emul: rename field 'cr2' of struct x86_event to 'extra'
  x86: don't assume #DB is always caused by singlestep if EFLAGS.TF is
set
  x86/pv: set DR_STEP if single-stepping after ro page fault emulation
  x86/pv: factor out single-step debug trap injection
  x86/debug: actually plumb pending_dbg through the monitor and
devicemodel interfaces

 xen/arch/x86/hvm/dm.c  |  2 +-
 xen/arch/x86/hvm/emulate.c |  3 +-
 xen/arch/x86/hvm/hvm.c | 11 --
 xen/arch/x86/hvm/svm/nestedsvm.c   |  2 +-
 xen/arch/x86/hvm/svm/svm.c | 48 --
 xen/arch/x86/hvm/vmx/vmx.c | 40 ++---
 xen/arch/x86/include/asm/domain.h  | 14 +++-
 xen/arch/x86/include/asm/hvm/hvm.h | 14 +++-
 xen/arch/x86/mm/shadow/multi.c |  5 +--
 xen/arch/x86/pv/emulate.c  |  5 +--
 xen/arch/x86/pv/ro-page-fault.c|  3 +-
 xen/arch/x86/pv/traps.c| 18 +++---
 xen/arch/x86/x86_emulate/x86_emulate.h |  6 ++--
 13 files changed, 128 insertions(+), 43 deletions(-)

-- 
2.41.0