Ping²: [PATCH] x86emul: de-duplicate scatters to the same linear address

2021-04-01 Thread Jan Beulich
On 17.02.2021 09:32, Jan Beulich wrote:
> On 05.02.2021 12:28, Jan Beulich wrote:
>> On 05.02.2021 11:41, Andrew Cooper wrote:
>>> On 10/11/2020 13:26, Jan Beulich wrote:
 The SDM specifically allows for earlier writes to fully overlapping
 ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
 would crash it if varying data was written to the same address. Detect
 overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
 be quite a bit more difficult.
>>>
>>> Are you saying that there is currently a bug if a guest does encode such
>>> an instruction, and we emulate it?
>>
>> That is my take on it, yes.
>>
 Note that due to cache slot use being linear address based, there's no
 similar issue with multiple writes to the same physical address (mapped
 through different linear addresses).

 Since this requires an adjustment to the EVEX Disp8 scaling test,
 correct a comment there at the same time.

 Signed-off-by: Jan Beulich 
 ---
 TBD: The SDM isn't entirely unambiguous about the faulting behavior in
  this case: If a fault would need delivering on the earlier slot
  despite the write getting squashed, we'd have to call ops->write()
  with size set to zero for the earlier write(s). However,
  hvm/emulate.c's handling of zero-byte accesses extends only to the
  virtual-to-linear address conversions (and raising of involved
  faults), so in order to also observe #PF changes to that logic
  would then also be needed. Can we live with a possible misbehavior
  here?
>>>
>>> Do you have a chapter/section reference?
>>
>> The instruction pages. They say in particular
>>
>> "If two or more destination indices completely overlap, the “earlier”
>>  write(s) may be skipped."
>>
>> and
>>
>> "Faults are delivered in a right-to-left manner. That is, if a fault
>>  is triggered by an element and delivered ..."
>>
>> To me this may or may not mean the skipping of indices includes the
>> skipping of faults (which a later element then would raise anyway).
> 
> Does the above address your concerns / questions? If not, what else
> do I need to provide?

I have to admit that I find it quite disappointing that this bug fix
has missed 4.15. It doesn't feel well here even more than elsewhere,
but again I'm intending to commit this - if need be without any acks -
once the tree is fully open again. As a bug fix it'll want backporting
as well.

Jan



Ping: [PATCH] x86emul: de-duplicate scatters to the same linear address

2021-02-17 Thread Jan Beulich
On 05.02.2021 12:28, Jan Beulich wrote:
> On 05.02.2021 11:41, Andrew Cooper wrote:
>> On 10/11/2020 13:26, Jan Beulich wrote:
>>> The SDM specifically allows for earlier writes to fully overlapping
>>> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
>>> would crash it if varying data was written to the same address. Detect
>>> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
>>> be quite a bit more difficult.
>>
>> Are you saying that there is currently a bug if a guest does encode such
>> an instruction, and we emulate it?
> 
> That is my take on it, yes.
> 
>>> Note that due to cache slot use being linear address based, there's no
>>> similar issue with multiple writes to the same physical address (mapped
>>> through different linear addresses).
>>>
>>> Since this requires an adjustment to the EVEX Disp8 scaling test,
>>> correct a comment there at the same time.
>>>
>>> Signed-off-by: Jan Beulich 
>>> ---
>>> TBD: The SDM isn't entirely unambiguous about the faulting behavior in
>>>  this case: If a fault would need delivering on the earlier slot
>>>  despite the write getting squashed, we'd have to call ops->write()
>>>  with size set to zero for the earlier write(s). However,
>>>  hvm/emulate.c's handling of zero-byte accesses extends only to the
>>>  virtual-to-linear address conversions (and raising of involved
>>>  faults), so in order to also observe #PF changes to that logic
>>>  would then also be needed. Can we live with a possible misbehavior
>>>  here?
>>
>> Do you have a chapter/section reference?
> 
> The instruction pages. They say in particular
> 
> "If two or more destination indices completely overlap, the “earlier”
>  write(s) may be skipped."
> 
> and
> 
> "Faults are delivered in a right-to-left manner. That is, if a fault
>  is triggered by an element and delivered ..."
> 
> To me this may or may not mean the skipping of indices includes the
> skipping of faults (which a later element then would raise anyway).

Does the above address your concerns / questions? If not, what else
do I need to provide?

Jan



Re: [PATCH] x86emul: de-duplicate scatters to the same linear address

2021-02-05 Thread Jan Beulich
On 05.02.2021 11:41, Andrew Cooper wrote:
> On 10/11/2020 13:26, Jan Beulich wrote:
>> The SDM specifically allows for earlier writes to fully overlapping
>> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
>> would crash it if varying data was written to the same address. Detect
>> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
>> be quite a bit more difficult.
> 
> Are you saying that there is currently a bug if a guest does encode such
> an instruction, and we emulate it?

That is my take on it, yes.

>> Note that due to cache slot use being linear address based, there's no
>> similar issue with multiple writes to the same physical address (mapped
>> through different linear addresses).
>>
>> Since this requires an adjustment to the EVEX Disp8 scaling test,
>> correct a comment there at the same time.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> TBD: The SDM isn't entirely unambiguous about the faulting behavior in
>>  this case: If a fault would need delivering on the earlier slot
>>  despite the write getting squashed, we'd have to call ops->write()
>>  with size set to zero for the earlier write(s). However,
>>  hvm/emulate.c's handling of zero-byte accesses extends only to the
>>  virtual-to-linear address conversions (and raising of involved
>>  faults), so in order to also observe #PF changes to that logic
>>  would then also be needed. Can we live with a possible misbehavior
>>  here?
> 
> Do you have a chapter/section reference?

The instruction pages. They say in particular

"If two or more destination indices completely overlap, the “earlier”
 write(s) may be skipped."

and

"Faults are delivered in a right-to-left manner. That is, if a fault
 is triggered by an element and delivered ..."

To me this may or may not mean the skipping of indices includes the
skipping of faults (which a later element then would raise anyway).

Jan



Re: [PATCH] x86emul: de-duplicate scatters to the same linear address

2021-02-05 Thread Andrew Cooper
On 10/11/2020 13:26, Jan Beulich wrote:
> The SDM specifically allows for earlier writes to fully overlapping
> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
> would crash it if varying data was written to the same address. Detect
> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
> be quite a bit more difficult.

Are you saying that there is currently a bug if a guest does encode such
an instruction, and we emulate it?

>
> Note that due to cache slot use being linear address based, there's no
> similar issue with multiple writes to the same physical address (mapped
> through different linear addresses).
>
> Since this requires an adjustment to the EVEX Disp8 scaling test,
> correct a comment there at the same time.
>
> Signed-off-by: Jan Beulich 
> ---
> TBD: The SDM isn't entirely unambiguous about the faulting behavior in
>  this case: If a fault would need delivering on the earlier slot
>  despite the write getting squashed, we'd have to call ops->write()
>  with size set to zero for the earlier write(s). However,
>  hvm/emulate.c's handling of zero-byte accesses extends only to the
>  virtual-to-linear address conversions (and raising of involved
>  faults), so in order to also observe #PF changes to that logic
>  would then also be needed. Can we live with a possible misbehavior
>  here?

Do you have a chapter/section reference?

~Andrew



Ping: [PATCH] x86emul: de-duplicate scatters to the same linear address

2021-02-05 Thread Jan Beulich
On 10.11.2020 14:26, Jan Beulich wrote:
> The SDM specifically allows for earlier writes to fully overlapping
> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
> would crash it if varying data was written to the same address. Detect
> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
> be quite a bit more difficult.
> 
> Note that due to cache slot use being linear address based, there's no
> similar issue with multiple writes to the same physical address (mapped
> through different linear addresses).
> 
> Since this requires an adjustment to the EVEX Disp8 scaling test,
> correct a comment there at the same time.
> 
> Signed-off-by: Jan Beulich 
> ---
> TBD: The SDM isn't entirely unambiguous about the faulting behavior in
>  this case: If a fault would need delivering on the earlier slot
>  despite the write getting squashed, we'd have to call ops->write()
>  with size set to zero for the earlier write(s). However,
>  hvm/emulate.c's handling of zero-byte accesses extends only to the
>  virtual-to-linear address conversions (and raising of involved
>  faults), so in order to also observe #PF changes to that logic
>  would then also be needed. Can we live with a possible misbehavior
>  here?
> 
> --- a/tools/tests/x86_emulator/evex-disp8.c
> +++ b/tools/tests/x86_emulator/evex-disp8.c
> @@ -647,8 +647,8 @@ static const uint16_t imm0f[16] = {
>  static struct x86_emulate_ops emulops;
>  
>  /*
> - * Access tracking (by granular) is used on the first 64 bytes of address
> - * space. Instructions get encode with a raw Disp8 value of 1, which then
> + * Access tracking (byte granular) is used on the first bytes of address
> + * space. Instructions get encoded with a raw Disp8 value of 1, which then
>   * gets scaled accordingly. Hence accesses below the address 
>   * as well as at or above 2 *  are indications of bugs. To
>   * aid diagnosis / debugging, track all accesses below 3 * .
> @@ -804,6 +804,31 @@ static void test_one(const struct test *
>  
>  asm volatile ( "kxnorw %k1, %k1, %k1" );
>  asm volatile ( "vxorps %xmm4, %xmm4, %xmm4" );
> +if ( sg && (test->opc | 3) == 0xa3 )
> +{
> +/*
> + * Non-prefetch scatters need handling specially, due to the
> + * overlapped write elimination done by the emulator. The order of
> + * indexes chosen is somewhat arbitrary, within the constraints
> + * imposed by the various different uses.
> + */
> +static const struct {
> +int32_t _[16];
> +} off32 = {{ 1, 0, 2, 3, 7, 6, 5, 4, 8, 10, 12, 14, 9, 11, 13, 15 }};
> +
> +if ( test->opc & 1 )
> +{
> +asm volatile ( "vpmovsxdq %0, %%zmm4" :: "m" (off32) );
> +vsz >>= !evex.w;
> +}
> +else
> +asm volatile ( "vmovdqu32 %0, %%zmm4" :: "m" (off32) );
> +
> +/* Scale by element size. */
> +instr[6] |= (evex.w | 2) << 6;
> +
> +sg = false;
> +}
>  
>  ctxt->regs->eip = (unsigned long)[0];
>  ctxt->regs->edx = 0;
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -10032,25 +10032,47 @@ x86_emulate(
>  
>  for ( i = 0; op_mask; ++i )
>  {
> -long idx = b & 1 ? index.qw[i] : index.dw[i];
> +long idx = (b & 1 ? index.qw[i]
> +  : index.dw[i]) * (1 << state->sib_scale);
> +unsigned long offs = truncate_ea(ea.mem.off + idx);
> +unsigned int j;
>  
>  if ( !(op_mask & (1 << i)) )
>  continue;
>  
> -rc = ops->write(ea.mem.seg,
> -truncate_ea(ea.mem.off +
> -idx * (1 << state->sib_scale)),
> -(void *)mmvalp + i * op_bytes, op_bytes, ctxt);
> -if ( rc != X86EMUL_OKAY )
> +/*
> + * hvmemul_linear_mmio_access() will find a cache slot based on
> + * linear address. hvmemul_phys_mmio_access() will crash the
> + * domain if observing varying data getting written to the same
> + * address within a cache slot. Utilize that squashing earlier
> + * writes to fully overlapping addresses is permitted by the 
> spec.
> + */
> +for ( j = i + 1; j < n; ++j )
>  {
> -/* See comment in gather emulation. */
> -if ( rc != X86EMUL_EXCEPTION && done )
> -rc = X86EMUL_RETRY;
> -break;
> +long idx2 = (b & 1 ? index.qw[j]
> +   : index.dw[j]) * (1 << state->sib_scale);
> +
> +if ( (op_mask & (1 << j)) &&
> + truncate_ea(ea.mem.off + idx2) == offs )
> +break;
> +}
> +
> +if ( j >= n )
> +{
> +  

[PATCH] x86emul: de-duplicate scatters to the same linear address

2020-11-10 Thread Jan Beulich
The SDM specifically allows for earlier writes to fully overlapping
ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
would crash it if varying data was written to the same address. Detect
overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
be quite a bit more difficult.

Note that due to cache slot use being linear address based, there's no
similar issue with multiple writes to the same physical address (mapped
through different linear addresses).

Since this requires an adjustment to the EVEX Disp8 scaling test,
correct a comment there at the same time.

Signed-off-by: Jan Beulich 
---
TBD: The SDM isn't entirely unambiguous about the faulting behavior in
 this case: If a fault would need delivering on the earlier slot
 despite the write getting squashed, we'd have to call ops->write()
 with size set to zero for the earlier write(s). However,
 hvm/emulate.c's handling of zero-byte accesses extends only to the
 virtual-to-linear address conversions (and raising of involved
 faults), so in order to also observe #PF changes to that logic
 would then also be needed. Can we live with a possible misbehavior
 here?

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -647,8 +647,8 @@ static const uint16_t imm0f[16] = {
 static struct x86_emulate_ops emulops;
 
 /*
- * Access tracking (by granular) is used on the first 64 bytes of address
- * space. Instructions get encode with a raw Disp8 value of 1, which then
+ * Access tracking (byte granular) is used on the first bytes of address
+ * space. Instructions get encoded with a raw Disp8 value of 1, which then
  * gets scaled accordingly. Hence accesses below the address 
  * as well as at or above 2 *  are indications of bugs. To
  * aid diagnosis / debugging, track all accesses below 3 * .
@@ -804,6 +804,31 @@ static void test_one(const struct test *
 
 asm volatile ( "kxnorw %k1, %k1, %k1" );
 asm volatile ( "vxorps %xmm4, %xmm4, %xmm4" );
+if ( sg && (test->opc | 3) == 0xa3 )
+{
+/*
+ * Non-prefetch scatters need handling specially, due to the
+ * overlapped write elimination done by the emulator. The order of
+ * indexes chosen is somewhat arbitrary, within the constraints
+ * imposed by the various different uses.
+ */
+static const struct {
+int32_t _[16];
+} off32 = {{ 1, 0, 2, 3, 7, 6, 5, 4, 8, 10, 12, 14, 9, 11, 13, 15 }};
+
+if ( test->opc & 1 )
+{
+asm volatile ( "vpmovsxdq %0, %%zmm4" :: "m" (off32) );
+vsz >>= !evex.w;
+}
+else
+asm volatile ( "vmovdqu32 %0, %%zmm4" :: "m" (off32) );
+
+/* Scale by element size. */
+instr[6] |= (evex.w | 2) << 6;
+
+sg = false;
+}
 
 ctxt->regs->eip = (unsigned long)[0];
 ctxt->regs->edx = 0;
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -10032,25 +10032,47 @@ x86_emulate(
 
 for ( i = 0; op_mask; ++i )
 {
-long idx = b & 1 ? index.qw[i] : index.dw[i];
+long idx = (b & 1 ? index.qw[i]
+  : index.dw[i]) * (1 << state->sib_scale);
+unsigned long offs = truncate_ea(ea.mem.off + idx);
+unsigned int j;
 
 if ( !(op_mask & (1 << i)) )
 continue;
 
-rc = ops->write(ea.mem.seg,
-truncate_ea(ea.mem.off +
-idx * (1 << state->sib_scale)),
-(void *)mmvalp + i * op_bytes, op_bytes, ctxt);
-if ( rc != X86EMUL_OKAY )
+/*
+ * hvmemul_linear_mmio_access() will find a cache slot based on
+ * linear address. hvmemul_phys_mmio_access() will crash the
+ * domain if observing varying data getting written to the same
+ * address within a cache slot. Utilize that squashing earlier
+ * writes to fully overlapping addresses is permitted by the spec.
+ */
+for ( j = i + 1; j < n; ++j )
 {
-/* See comment in gather emulation. */
-if ( rc != X86EMUL_EXCEPTION && done )
-rc = X86EMUL_RETRY;
-break;
+long idx2 = (b & 1 ? index.qw[j]
+   : index.dw[j]) * (1 << state->sib_scale);
+
+if ( (op_mask & (1 << j)) &&
+ truncate_ea(ea.mem.off + idx2) == offs )
+break;
+}
+
+if ( j >= n )
+{
+rc = ops->write(ea.mem.seg, offs,
+(void *)mmvalp + i * op_bytes, op_bytes, ctxt);
+if ( rc != X86EMUL_OKAY )
+{
+/* See comment in gather emulation. */
+if