Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2025-04-10 Thread Myrsky Lintu
Thank you. I will try to bring this up with QEMU developers then.

On 2025-04-10 05:12:01, Yan Zhao wrote:
> Hi,
> 
> AFAIK, the commit c9c1e20b4c7d ("KVM: x86: Introduce Intel specific quirk
> KVM_X86_QUIRK_IGNORE_GUEST_PAT") which re-allows honoring guest PAT on Intel's
> platforms has been in kvm/queue now.
> 
> However, as the quirk is enabled by default, userspace(like QEMU) needs to 
> turn
> it off by code like "kvm_vm_enable_cap(kvm_state, KVM_CAP_DISABLE_QUIRKS2, 0,
> KVM_X86_QUIRK_IGNORE_GUEST_PAT)" to honor guest PAT, according to the doc:
> 
> KVM_X86_QUIRK_IGNORE_GUEST_PAT   ...
>   Userspace can disable the quirk to honor
>   guest PAT if it knows that there is no such
>   guest software, for example if it does not
>   expose a bochs graphics device (which is
>   known to have had a buggy driver).
> 
> Thanks
> Yan
> 
> On Thu, Apr 10, 2025 at 01:13:18AM +, Myrsky Lintu wrote:
>> Hello,
>>
>> I am completely new to and uninformed about kernel development. I was
>> pointed here from Mesa documentation for Venus (Vulkan encapsulation for
>> KVM/QEMU): https://docs.mesa3d.org/drivers/venus.html
>>
>> Based on my limited understanding of what has happened here, this patch
>> series was partially reverted due to an issue with the Bochs DRM driver.
>> A fix for that issue has been merged months ago according to the link
>> provided in an earlier message. Since then work on this detail of KVM
>> seems to have stalled.
>>
>> Is it reasonable to ask here for this patch series to be evaluated and
>> incorporated again?
>>
>> My layperson's attempt at applying the series against 6.14.1 source code
>> failed. In addition to the parts that appear to have already been
>> incorporated there are some parts of the patch series that are rejected.
>> I lack the knowledge to correct that.
>>
>> Distro kernels currently ship without it which limits the usability of
>> Venus on AMD and NVIDIA GPUs paired with Intel CPUs. Convincing
>> individual distro maintainers of the necessity of this patch series
>> without the specialized knowledge required for understanding what it
>> does and performing that evaluation is quite hard. If upstream (kernel)
>> would apply it now the distros would ship a kernel including the
>> required changes to users, including me, without that multiplicated effort.
>>
>> Thank you for your time. If this request is out of place here please
>> forgive me for engaging this mailing list without a proper understanding
>> of the list's scope.
>>
>> On 2024-10-07 14:04:24, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 07.10.24 15:38, Vitaly Kuznetsov wrote:
 "Linux regression tracking (Thorsten Leemhuis)"
  writes:

> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
>> Sean Christopherson  writes:
>>
>>> Unconditionally honor guest PAT on CPUs that support self-snoop, as
>>> Intel has confirmed that CPUs that support self-snoop always snoop 
>>> caches
>>> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
>>> even in the presence of aliased memtypes, thus there is no need to trust
>>> the guest behaves and only honor PAT as a last resort, as KVM does 
>>> today.
>>>
>>> Honoring guest PAT is desirable for use cases where the guest has access
>>> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
>>> (mediated, for all intents and purposes) GPU is exposed to the guest, 
>>> along
>>> with buffers that are consumed directly by the physical GPU, i.e. which
>>> can't be proxied by the host to ensure writes from the guest are 
>>> performed
>>> with the correct memory type for the GPU.
>>
>> Necroposting!
>>
>> Turns out that this change broke "bochs-display" driver in QEMU even
>> when the guest is modern (don't ask me 'who the hell uses bochs for
>> modern guests', it was basically a configuration error :-). E.g:
>> [...]
>
> This regression made it to the list of tracked regressions. It seems
> this thread stalled a while ago. Was this ever fixed? Does not look like
> it, but I might have missed something. Or is this a regression I should
> just ignore for one reason or another?
>

 The regression was addressed in by reverting 377b2f359d1f in 6.11

 commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
 Author: Paolo Bonzini 
 Date:   Sun Sep 15 02:49:33 2024 -0400

   Revert "KVM: VMX: Always honor guest PAT on CPUs that support 
 self-snoop"
>>>
>>> Thx. Sorry, missed that, thx for pointing me towards it. I had looked
>>> for things like that, but seems I messed up my lore query. Apologies for
>>> the noise!
>>>
 Also, there's a (pending) DRM patch fixing it from the guest's side:
 https://gitlab.freedeskto

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2025-04-09 Thread Yan Zhao
Hi,

AFAIK, the commit c9c1e20b4c7d ("KVM: x86: Introduce Intel specific quirk
KVM_X86_QUIRK_IGNORE_GUEST_PAT") which re-allows honoring guest PAT on Intel's
platforms has been in kvm/queue now.

However, as the quirk is enabled by default, userspace(like QEMU) needs to turn
it off by code like "kvm_vm_enable_cap(kvm_state, KVM_CAP_DISABLE_QUIRKS2, 0,
KVM_X86_QUIRK_IGNORE_GUEST_PAT)" to honor guest PAT, according to the doc:

KVM_X86_QUIRK_IGNORE_GUEST_PAT   ...
 Userspace can disable the quirk to honor
 guest PAT if it knows that there is no such
 guest software, for example if it does not
 expose a bochs graphics device (which is
 known to have had a buggy driver).

Thanks
Yan

On Thu, Apr 10, 2025 at 01:13:18AM +, Myrsky Lintu wrote:
> Hello,
> 
> I am completely new to and uninformed about kernel development. I was 
> pointed here from Mesa documentation for Venus (Vulkan encapsulation for 
> KVM/QEMU): https://docs.mesa3d.org/drivers/venus.html
> 
> Based on my limited understanding of what has happened here, this patch 
> series was partially reverted due to an issue with the Bochs DRM driver. 
> A fix for that issue has been merged months ago according to the link 
> provided in an earlier message. Since then work on this detail of KVM 
> seems to have stalled.
> 
> Is it reasonable to ask here for this patch series to be evaluated and 
> incorporated again?
> 
> My layperson's attempt at applying the series against 6.14.1 source code 
> failed. In addition to the parts that appear to have already been 
> incorporated there are some parts of the patch series that are rejected. 
> I lack the knowledge to correct that.
> 
> Distro kernels currently ship without it which limits the usability of 
> Venus on AMD and NVIDIA GPUs paired with Intel CPUs. Convincing 
> individual distro maintainers of the necessity of this patch series 
> without the specialized knowledge required for understanding what it 
> does and performing that evaluation is quite hard. If upstream (kernel) 
> would apply it now the distros would ship a kernel including the 
> required changes to users, including me, without that multiplicated effort.
> 
> Thank you for your time. If this request is out of place here please 
> forgive me for engaging this mailing list without a proper understanding 
> of the list's scope.
> 
> On 2024-10-07 14:04:24, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 07.10.24 15:38, Vitaly Kuznetsov wrote:
> >> "Linux regression tracking (Thorsten Leemhuis)"
> >>  writes:
> >>
> >>> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
>  Sean Christopherson  writes:
> 
> > Unconditionally honor guest PAT on CPUs that support self-snoop, as
> > Intel has confirmed that CPUs that support self-snoop always snoop 
> > caches
> > and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> > even in the presence of aliased memtypes, thus there is no need to trust
> > the guest behaves and only honor PAT as a last resort, as KVM does 
> > today.
> >
> > Honoring guest PAT is desirable for use cases where the guest has access
> > to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> > (mediated, for all intents and purposes) GPU is exposed to the guest, 
> > along
> > with buffers that are consumed directly by the physical GPU, i.e. which
> > can't be proxied by the host to ensure writes from the guest are 
> > performed
> > with the correct memory type for the GPU.
> 
>  Necroposting!
> 
>  Turns out that this change broke "bochs-display" driver in QEMU even
>  when the guest is modern (don't ask me 'who the hell uses bochs for
>  modern guests', it was basically a configuration error :-). E.g:
>  [...]
> >>>
> >>> This regression made it to the list of tracked regressions. It seems
> >>> this thread stalled a while ago. Was this ever fixed? Does not look like
> >>> it, but I might have missed something. Or is this a regression I should
> >>> just ignore for one reason or another?
> >>>
> >>
> >> The regression was addressed in by reverting 377b2f359d1f in 6.11
> >>
> >> commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
> >> Author: Paolo Bonzini 
> >> Date:   Sun Sep 15 02:49:33 2024 -0400
> >>
> >>  Revert "KVM: VMX: Always honor guest PAT on CPUs that support 
> >> self-snoop"
> > 
> > Thx. Sorry, missed that, thx for pointing me towards it. I had looked
> > for things like that, but seems I messed up my lore query. Apologies for
> > the noise!
> > 
> >> Also, there's a (pending) DRM patch fixing it from the guest's side:
> >> https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e
> > 
> > Great!
> > 
> > Ciao, Thorsten
> > 
> > P.S.:
> > 
> > #regzbot fix: 9d70f3fec14421

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2025-04-09 Thread Myrsky Lintu
Hello,

I am completely new to and uninformed about kernel development. I was 
pointed here from Mesa documentation for Venus (Vulkan encapsulation for 
KVM/QEMU): https://docs.mesa3d.org/drivers/venus.html

Based on my limited understanding of what has happened here, this patch 
series was partially reverted due to an issue with the Bochs DRM driver. 
A fix for that issue has been merged months ago according to the link 
provided in an earlier message. Since then work on this detail of KVM 
seems to have stalled.

Is it reasonable to ask here for this patch series to be evaluated and 
incorporated again?

My layperson's attempt at applying the series against 6.14.1 source code 
failed. In addition to the parts that appear to have already been 
incorporated there are some parts of the patch series that are rejected. 
I lack the knowledge to correct that.

Distro kernels currently ship without it which limits the usability of 
Venus on AMD and NVIDIA GPUs paired with Intel CPUs. Convincing 
individual distro maintainers of the necessity of this patch series 
without the specialized knowledge required for understanding what it 
does and performing that evaluation is quite hard. If upstream (kernel) 
would apply it now the distros would ship a kernel including the 
required changes to users, including me, without that multiplicated effort.

Thank you for your time. If this request is out of place here please 
forgive me for engaging this mailing list without a proper understanding 
of the list's scope.

On 2024-10-07 14:04:24, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 07.10.24 15:38, Vitaly Kuznetsov wrote:
>> "Linux regression tracking (Thorsten Leemhuis)"
>>  writes:
>>
>>> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
 Sean Christopherson  writes:

> Unconditionally honor guest PAT on CPUs that support self-snoop, as
> Intel has confirmed that CPUs that support self-snoop always snoop caches
> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> even in the presence of aliased memtypes, thus there is no need to trust
> the guest behaves and only honor PAT as a last resort, as KVM does today.
>
> Honoring guest PAT is desirable for use cases where the guest has access
> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> (mediated, for all intents and purposes) GPU is exposed to the guest, 
> along
> with buffers that are consumed directly by the physical GPU, i.e. which
> can't be proxied by the host to ensure writes from the guest are performed
> with the correct memory type for the GPU.

 Necroposting!

 Turns out that this change broke "bochs-display" driver in QEMU even
 when the guest is modern (don't ask me 'who the hell uses bochs for
 modern guests', it was basically a configuration error :-). E.g:
 [...]
>>>
>>> This regression made it to the list of tracked regressions. It seems
>>> this thread stalled a while ago. Was this ever fixed? Does not look like
>>> it, but I might have missed something. Or is this a regression I should
>>> just ignore for one reason or another?
>>>
>>
>> The regression was addressed in by reverting 377b2f359d1f in 6.11
>>
>> commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
>> Author: Paolo Bonzini 
>> Date:   Sun Sep 15 02:49:33 2024 -0400
>>
>>  Revert "KVM: VMX: Always honor guest PAT on CPUs that support 
>> self-snoop"
> 
> Thx. Sorry, missed that, thx for pointing me towards it. I had looked
> for things like that, but seems I messed up my lore query. Apologies for
> the noise!
> 
>> Also, there's a (pending) DRM patch fixing it from the guest's side:
>> https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e
> 
> Great!
> 
> Ciao, Thorsten
> 
> P.S.:
> 
> #regzbot fix: 9d70f3fec14421e793ffbc0ec2f739b24e534900
> 
> 
> 





Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-10-07 Thread Linux regression tracking (Thorsten Leemhuis)
On 07.10.24 15:38, Vitaly Kuznetsov wrote:
> "Linux regression tracking (Thorsten Leemhuis)"
>  writes:
> 
>> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
>>> Sean Christopherson  writes:
>>>
 Unconditionally honor guest PAT on CPUs that support self-snoop, as
 Intel has confirmed that CPUs that support self-snoop always snoop caches
 and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
 even in the presence of aliased memtypes, thus there is no need to trust
 the guest behaves and only honor PAT as a last resort, as KVM does today.

 Honoring guest PAT is desirable for use cases where the guest has access
 to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
 (mediated, for all intents and purposes) GPU is exposed to the guest, along
 with buffers that are consumed directly by the physical GPU, i.e. which
 can't be proxied by the host to ensure writes from the guest are performed
 with the correct memory type for the GPU.
>>>
>>> Necroposting!
>>>
>>> Turns out that this change broke "bochs-display" driver in QEMU even
>>> when the guest is modern (don't ask me 'who the hell uses bochs for
>>> modern guests', it was basically a configuration error :-). E.g:
>>> [...]
>>
>> This regression made it to the list of tracked regressions. It seems
>> this thread stalled a while ago. Was this ever fixed? Does not look like
>> it, but I might have missed something. Or is this a regression I should
>> just ignore for one reason or another?
>>
> 
> The regression was addressed in by reverting 377b2f359d1f in 6.11
> 
> commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
> Author: Paolo Bonzini 
> Date:   Sun Sep 15 02:49:33 2024 -0400
> 
> Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"

Thx. Sorry, missed that, thx for pointing me towards it. I had looked
for things like that, but seems I messed up my lore query. Apologies for
the noise!

> Also, there's a (pending) DRM patch fixing it from the guest's side:
> https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e

Great!

Ciao, Thorsten

P.S.:

#regzbot fix: 9d70f3fec14421e793ffbc0ec2f739b24e534900





Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-10-07 Thread Vitaly Kuznetsov
"Linux regression tracking (Thorsten Leemhuis)"
 writes:

> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
>> Sean Christopherson  writes:
>> 
>>> Unconditionally honor guest PAT on CPUs that support self-snoop, as
>>> Intel has confirmed that CPUs that support self-snoop always snoop caches
>>> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
>>> even in the presence of aliased memtypes, thus there is no need to trust
>>> the guest behaves and only honor PAT as a last resort, as KVM does today.
>>>
>>> Honoring guest PAT is desirable for use cases where the guest has access
>>> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
>>> (mediated, for all intents and purposes) GPU is exposed to the guest, along
>>> with buffers that are consumed directly by the physical GPU, i.e. which
>>> can't be proxied by the host to ensure writes from the guest are performed
>>> with the correct memory type for the GPU.
>> 
>> Necroposting!
>> 
>> Turns out that this change broke "bochs-display" driver in QEMU even
>> when the guest is modern (don't ask me 'who the hell uses bochs for
>> modern guests', it was basically a configuration error :-). E.g:
>> [...]
>
> This regression made it to the list of tracked regressions. It seems
> this thread stalled a while ago. Was this ever fixed? Does not look like
> it, but I might have missed something. Or is this a regression I should
> just ignore for one reason or another?
>

The regression was addressed in by reverting 377b2f359d1f in 6.11

commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
Author: Paolo Bonzini 
Date:   Sun Sep 15 02:49:33 2024 -0400

Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"

Also, there's a (pending) DRM patch fixing it from the guest's side:
https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e

-- 
Vitaly




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-10-07 Thread Linux regression tracking (Thorsten Leemhuis)
On 30.08.24 11:35, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
>> Unconditionally honor guest PAT on CPUs that support self-snoop, as
>> Intel has confirmed that CPUs that support self-snoop always snoop caches
>> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
>> even in the presence of aliased memtypes, thus there is no need to trust
>> the guest behaves and only honor PAT as a last resort, as KVM does today.
>>
>> Honoring guest PAT is desirable for use cases where the guest has access
>> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
>> (mediated, for all intents and purposes) GPU is exposed to the guest, along
>> with buffers that are consumed directly by the physical GPU, i.e. which
>> can't be proxied by the host to ensure writes from the guest are performed
>> with the correct memory type for the GPU.
> 
> Necroposting!
> 
> Turns out that this change broke "bochs-display" driver in QEMU even
> when the guest is modern (don't ask me 'who the hell uses bochs for
> modern guests', it was basically a configuration error :-). E.g:
> [...]

This regression made it to the list of tracked regressions. It seems
this thread stalled a while ago. Was this ever fixed? Does not look like
it, but I might have missed something. Or is this a regression I should
just ignore for one reason or another?


Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Yan Zhao
On Mon, Sep 09, 2024 at 03:24:40PM +0200, Paolo Bonzini wrote:
> While this is a fix for future kernels, it doesn't change the result for VMs
> already in existence.
Though this is the truth, I have concerns that there may be other guest drivers
with improper PAT configurations that were previously masked by KVM's force-WB
setting. Now that we respect the guest's PAT settings, these misconfigurations
could lead to degraded performance, potentially perceived as errors, as was
observed in the previous VMX unit test and the current Bochs scenario.

> I don't think there's an alternative to putting this behind a quirk.




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Sean Christopherson
On Mon, Sep 09, 2024, Paolo Bonzini wrote:
> On 9/9/24 07:30, Yan Zhao wrote:
> > On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
> > > On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> > > > On Wed, Sep 04, 2024, Yan Zhao wrote:
> > > > > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > > > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > > > > Sean Christopherson  writes:
> > > > > > > 
> > > > > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > > > > > > FWIW, I use QEMU-9.0 from the same C10S 
> > > > > > > > > (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > > > > > > but I don't think it matters in this case. My CPU is 
> > > > > > > > > "Intel(R) Xeon(R)
> > > > > > > > > Silver 4410Y".
> > > > > > > > 
> > > > > > > > Has this been reproduced on any other hardware besides SPR?  
> > > > > > > > I.e. did we stumble
> > > > > > > > on another hardware issue?
> > > > > > > 
> > > > > > > Very possible, as according to Yan Zhao this doesn't reproduce on 
> > > > > > > at
> > > > > > > least "Coffee Lake-S". Let me try to grab some random hardware 
> > > > > > > around
> > > > > > > and I'll be back with my observations.
> > > > > > 
> > > > > > Update some new findings from my side:
> > > > > > 
> > > > > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys 
> > > > > > range
> > > > > > from 0xfd00 to 0xfe00.
> > > > > > 
> > > > > > On "Sapphire Rapids XCC":
> > > > > > 
> > > > > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can 
> > > > > > launch
> > > > > > correctly.
> > > > > > i.e.
> > > > > > if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > > > > > return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | 
> > > > > > VMX_EPT_IPAT_BIT;
> > > > > > }
> > > > > > return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > > > > 
> > > > > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes 
> > > > > > to show / gdm
> > > > > > restarts endlessly. (though on Coffee Lake-S, installer/gdm can 
> > > > > > launch
> > > > > > correctly in this case).
> > > > > > 
> > > > > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is 
> > > > > > called to set
> > > > > > this fb_map range as WC, with
> > > > > > iosys_map_set_vaddr_iomem(&iter_io->dmap, 
> > > > > > ioremap_wc(mem->bus.offset, mem->size));
> > > > > > 
> > > > > > However, during 
> > > > > > bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
> > > > > > this fb_map has been reserved as uc- by ioremap().
> > > > > > Then, the ioremap_wc() during starting GDM will only map guest 
> > > > > > PAT with UC-.
> > > > > > 
> > > > > > So, with KVM setting WB (no IPAT) to this fb_map range, the 
> > > > > > effective
> > > > > > memory type is UC- and installer/gdm restarts endlessly.
> > > > > > 
> > > > > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest 
> > > > > > bochs driver
> > > > > > to call ioremap_wc() instead in bochs_hw_init(), gdm can launch 
> > > > > > correctly.
> > > > > > (didn't verify the installer's case as I can't update the 
> > > > > > driver in that case).
> > > > > > 
> > > > > > The reason is that the ioremap_wc() called during starting GDM 
> > > > > > will no longer
> > > > > > meet conflict and can map guest PAT as WC.
> > > > 
> > > > Huh.  The upside of this is that it sounds like there's nothing broken 
> > > > with WC
> > > > or self-snoop.
> > > Considering a different perspective, the fb_map range is used as frame 
> > > buffer
> > > (vram), with the guest writing to this range and the host reading from it.
> > > If the issue were related to self-snooping, we would expect the VNC 
> > > window to
> > > display distorted data. However, the observed behavior is that the GDM 
> > > window
> > > shows up correctly for a sec and restarts over and over.
> > > 
> > > So, do you think we can simply fix this issue by calling ioremap_wc() for 
> > > the
> > > frame buffer/vram range in bochs driver, as is commonly done in other gpu
> > > drivers?
> > > 
> > > --- a/drivers/gpu/drm/tiny/bochs.c
> > > +++ b/drivers/gpu/drm/tiny/bochs.c
> > > @@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
> > >  if (pci_request_region(pdev, 0, "bochs-drm") != 0)
> > >  DRM_WARN("Cannot request framebuffer, boot fb still 
> > > active?\n");
> > > 
> > > -   bochs->fb_map = ioremap(addr, size);
> > > +   bochs->fb_map = ioremap_wc(addr, size);
> > >  if (bochs->fb_map == NULL) {
> > >  DRM_ERROR("Cannot map framebuffer\n");
> > >  return -ENOMEM;
> 
> While this is a fix for future kernels, it doesn't change the result for VMs
> already in existence.

I would prefer to bottom out on exactly whether or not the SPR/CLX behavior is
working as intended.  Maybe the ~8x slowdown is just a side effect of any Intel

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Paolo Bonzini

On 9/9/24 07:30, Yan Zhao wrote:

On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:

On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:

On Wed, Sep 04, 2024, Yan Zhao wrote:

On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:

On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:

Sean Christopherson  writes:


On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:

FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
Silver 4410Y".


Has this been reproduced on any other hardware besides SPR?  I.e. did we stumble
on another hardware issue?


Very possible, as according to Yan Zhao this doesn't reproduce on at
least "Coffee Lake-S". Let me try to grab some random hardware around
and I'll be back with my observations.


Update some new findings from my side:

BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
from 0xfd00 to 0xfe00.

On "Sapphire Rapids XCC":

1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
correctly.
i.e.
if (gfn >= 0xfd000 && gfn < 0xfe000) {
return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
}
return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
correctly in this case).

3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
this fb_map range as WC, with
iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
mem->size));

However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
this fb_map has been reserved as uc- by ioremap().
Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.

So, with KVM setting WB (no IPAT) to this fb_map range, the effective
memory type is UC- and installer/gdm restarts endlessly.

4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
(didn't verify the installer's case as I can't update the driver in that 
case).

The reason is that the ioremap_wc() called during starting GDM will no 
longer
meet conflict and can map guest PAT as WC.


Huh.  The upside of this is that it sounds like there's nothing broken with WC
or self-snoop.

Considering a different perspective, the fb_map range is used as frame buffer
(vram), with the guest writing to this range and the host reading from it.
If the issue were related to self-snooping, we would expect the VNC window to
display distorted data. However, the observed behavior is that the GDM window
shows up correctly for a sec and restarts over and over.

So, do you think we can simply fix this issue by calling ioremap_wc() for the
frame buffer/vram range in bochs driver, as is commonly done in other gpu
drivers?

--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
 if (pci_request_region(pdev, 0, "bochs-drm") != 0)
 DRM_WARN("Cannot request framebuffer, boot fb still 
active?\n");

-   bochs->fb_map = ioremap(addr, size);
+   bochs->fb_map = ioremap_wc(addr, size);
 if (bochs->fb_map == NULL) {
 DRM_ERROR("Cannot map framebuffer\n");
 return -ENOMEM;


While this is a fix for future kernels, it doesn't change the result for 
VMs already in existence.


I don't think there's an alternative to putting this behind a quirk.

Paolo




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-08 Thread Yan Zhao
On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
> On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> > On Wed, Sep 04, 2024, Yan Zhao wrote:
> > > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > > Sean Christopherson  writes:
> > > > > 
> > > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > > >> FWIW, I use QEMU-9.0 from the same C10S 
> > > > > >> (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > > >> but I don't think it matters in this case. My CPU is "Intel(R) 
> > > > > >> Xeon(R)
> > > > > >> Silver 4410Y".
> > > > > >
> > > > > > Has this been reproduced on any other hardware besides SPR?  I.e. 
> > > > > > did we stumble
> > > > > > on another hardware issue?
> > > > > 
> > > > > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > > > > least "Coffee Lake-S". Let me try to grab some random hardware around
> > > > > and I'll be back with my observations.
> > > > 
> > > > Update some new findings from my side:
> > > > 
> > > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys 
> > > > range
> > > > from 0xfd00 to 0xfe00.
> > > > 
> > > > On "Sapphire Rapids XCC":
> > > > 
> > > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can 
> > > > launch
> > > >correctly. 
> > > >i.e.
> > > >if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > > > return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | 
> > > > VMX_EPT_IPAT_BIT;
> > > >}
> > > >return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > > 
> > > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to 
> > > > show / gdm
> > > >restarts endlessly. (though on Coffee Lake-S, installer/gdm can 
> > > > launch
> > > >correctly in this case).
> > > > 
> > > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called 
> > > > to set
> > > >this fb_map range as WC, with
> > > >iosys_map_set_vaddr_iomem(&iter_io->dmap, 
> > > > ioremap_wc(mem->bus.offset, mem->size));
> > > > 
> > > >However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), 
> > > > pfns for
> > > >this fb_map has been reserved as uc- by ioremap().
> > > >Then, the ioremap_wc() during starting GDM will only map guest PAT 
> > > > with UC-.
> > > > 
> > > >So, with KVM setting WB (no IPAT) to this fb_map range, the effective
> > > >memory type is UC- and installer/gdm restarts endlessly.
> > > > 
> > > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest 
> > > > bochs driver
> > > >to call ioremap_wc() instead in bochs_hw_init(), gdm can launch 
> > > > correctly.
> > > >(didn't verify the installer's case as I can't update the driver in 
> > > > that case).
> > > > 
> > > >The reason is that the ioremap_wc() called during starting GDM will 
> > > > no longer
> > > >meet conflict and can map guest PAT as WC.
> > 
> > Huh.  The upside of this is that it sounds like there's nothing broken with 
> > WC
> > or self-snoop.
> Considering a different perspective, the fb_map range is used as frame buffer
> (vram), with the guest writing to this range and the host reading from it.
> If the issue were related to self-snooping, we would expect the VNC window to
> display distorted data. However, the observed behavior is that the GDM window
> shows up correctly for a sec and restarts over and over.
> 
> So, do you think we can simply fix this issue by calling ioremap_wc() for the
> frame buffer/vram range in bochs driver, as is commonly done in other gpu
> drivers?
> 
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
> if (pci_request_region(pdev, 0, "bochs-drm") != 0)
> DRM_WARN("Cannot request framebuffer, boot fb still 
> active?\n");
> 
> -   bochs->fb_map = ioremap(addr, size);
> +   bochs->fb_map = ioremap_wc(addr, size);
> if (bochs->fb_map == NULL) {
> DRM_ERROR("Cannot map framebuffer\n");
> return -ENOMEM;
> 
> 
> > 
> > > > WIP to find out why effective UC in fb_map range will make gdm to 
> > > > restart
> > > > endlessly.
> > > Not sure whether it's simply because UC is too slow.
> > > 
> > > T=Test execution time of a selftest in which guest writes to a GPA for
> > >   0x100UL times
> > > 
> > >   | Sapphire Rapids XCC  | Coffee Lake-S
> > > --|--|-
> > > KVM UC+IPAT   |T=0m4.530s|  T=0m0.622s
> > 
> > Woah.  Have you tried testing MOVDIR64 and/or WT?  E.g. to see if the 
> > problem is
> > with UC specifically, or if it occurs with any accesses that immediately 
> > write
> > through to main memory.
> > 
> > > --|--|-
> > > KVM WC+IPAT   |T=0m0.149s|  T=0m0.176s
> > > --|---

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-05 Thread Yan Zhao
On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> On Wed, Sep 04, 2024, Yan Zhao wrote:
> > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > Sean Christopherson  writes:
> > > > 
> > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > >> FWIW, I use QEMU-9.0 from the same C10S 
> > > > >> (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > >> but I don't think it matters in this case. My CPU is "Intel(R) 
> > > > >> Xeon(R)
> > > > >> Silver 4410Y".
> > > > >
> > > > > Has this been reproduced on any other hardware besides SPR?  I.e. did 
> > > > > we stumble
> > > > > on another hardware issue?
> > > > 
> > > > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > > > least "Coffee Lake-S". Let me try to grab some random hardware around
> > > > and I'll be back with my observations.
> > > 
> > > Update some new findings from my side:
> > > 
> > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
> > > from 0xfd00 to 0xfe00.
> > > 
> > > On "Sapphire Rapids XCC":
> > > 
> > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
> > >correctly. 
> > >i.e.
> > >if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > >   return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | 
> > > VMX_EPT_IPAT_BIT;
> > >}
> > >return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > 
> > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to 
> > > show / gdm
> > >restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
> > >correctly in this case).
> > > 
> > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to 
> > > set
> > >this fb_map range as WC, with
> > >iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
> > > mem->size));
> > > 
> > >However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), 
> > > pfns for
> > >this fb_map has been reserved as uc- by ioremap().
> > >Then, the ioremap_wc() during starting GDM will only map guest PAT 
> > > with UC-.
> > > 
> > >So, with KVM setting WB (no IPAT) to this fb_map range, the effective
> > >memory type is UC- and installer/gdm restarts endlessly.
> > > 
> > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs 
> > > driver
> > >to call ioremap_wc() instead in bochs_hw_init(), gdm can launch 
> > > correctly.
> > >(didn't verify the installer's case as I can't update the driver in 
> > > that case).
> > > 
> > >The reason is that the ioremap_wc() called during starting GDM will no 
> > > longer
> > >meet conflict and can map guest PAT as WC.
> 
> Huh.  The upside of this is that it sounds like there's nothing broken with WC
> or self-snoop.
Considering a different perspective, the fb_map range is used as frame buffer
(vram), with the guest writing to this range and the host reading from it.
If the issue were related to self-snooping, we would expect the VNC window to
display distorted data. However, the observed behavior is that the GDM window
shows up correctly for a sec and restarts over and over.

So, do you think we can simply fix this issue by calling ioremap_wc() for the
frame buffer/vram range in bochs driver, as is commonly done in other gpu
drivers?

--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
if (pci_request_region(pdev, 0, "bochs-drm") != 0)
DRM_WARN("Cannot request framebuffer, boot fb still active?\n");

-   bochs->fb_map = ioremap(addr, size);
+   bochs->fb_map = ioremap_wc(addr, size);
if (bochs->fb_map == NULL) {
DRM_ERROR("Cannot map framebuffer\n");
return -ENOMEM;


> 
> > > WIP to find out why effective UC in fb_map range will make gdm to restart
> > > endlessly.
> > Not sure whether it's simply because UC is too slow.
> > 
> > T=Test execution time of a selftest in which guest writes to a GPA for
> >   0x100UL times
> > 
> >   | Sapphire Rapids XCC  | Coffee Lake-S
> > --|--|-
> > KVM UC+IPAT   |T=0m4.530s|  T=0m0.622s
> 
> Woah.  Have you tried testing MOVDIR64 and/or WT?  E.g. to see if the problem 
> is
> with UC specifically, or if it occurs with any accesses that immediately write
> through to main memory.
> 
> > --|--|-
> > KVM WC+IPAT   |T=0m0.149s|  T=0m0.176s
> > --|--|-
> > KVM WB+IPAT   |T=0m0.148s|  T=0m0.148s
> > --

I re-run all the tests and collected an averaged data (10 times each) as
below (previous data was just a single-run score):


T=Test execution time of a selftest in which guest w

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-04 Thread Sean Christopherson
On Wed, Sep 04, 2024, Yan Zhao wrote:
> On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > Sean Christopherson  writes:
> > > 
> > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > >> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> > > >> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> > > >> Silver 4410Y".
> > > >
> > > > Has this been reproduced on any other hardware besides SPR?  I.e. did 
> > > > we stumble
> > > > on another hardware issue?
> > > 
> > > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > > least "Coffee Lake-S". Let me try to grab some random hardware around
> > > and I'll be back with my observations.
> > 
> > Update some new findings from my side:
> > 
> > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
> > from 0xfd00 to 0xfe00.
> > 
> > On "Sapphire Rapids XCC":
> > 
> > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
> >correctly. 
> >i.e.
> >if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> >}
> >return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > 
> > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show 
> > / gdm
> >restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
> >correctly in this case).
> > 
> > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
> >this fb_map range as WC, with
> >iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
> > mem->size));
> > 
> >However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns 
> > for
> >this fb_map has been reserved as uc- by ioremap().
> >Then, the ioremap_wc() during starting GDM will only map guest PAT with 
> > UC-.
> > 
> >So, with KVM setting WB (no IPAT) to this fb_map range, the effective
> >memory type is UC- and installer/gdm restarts endlessly.
> > 
> > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs 
> > driver
> >to call ioremap_wc() instead in bochs_hw_init(), gdm can launch 
> > correctly.
> >(didn't verify the installer's case as I can't update the driver in that 
> > case).
> > 
> >The reason is that the ioremap_wc() called during starting GDM will no 
> > longer
> >meet conflict and can map guest PAT as WC.

Huh.  The upside of this is that it sounds like there's nothing broken with WC
or self-snoop.

> > WIP to find out why effective UC in fb_map range will make gdm to restart
> > endlessly.
> Not sure whether it's simply because UC is too slow.
> 
> T=Test execution time of a selftest in which guest writes to a GPA for
>   0x100UL times
> 
>   | Sapphire Rapids XCC  | Coffee Lake-S
> --|--|-
> KVM UC+IPAT   |T=0m4.530s|  T=0m0.622s

Woah.  Have you tried testing MOVDIR64 and/or WT?  E.g. to see if the problem is
with UC specifically, or if it occurs with any accesses that immediately write
through to main memory.

> --|--|-
> KVM WC+IPAT   |T=0m0.149s|  T=0m0.176s
> --|--|-
> KVM WB+IPAT   |T=0m0.148s|  T=0m0.148s
> --



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-04 Thread Yan Zhao
On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > Sean Christopherson  writes:
> > 
> > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > >> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> > >> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> > >> Silver 4410Y".
> > >
> > > Has this been reproduced on any other hardware besides SPR?  I.e. did we 
> > > stumble
> > > on another hardware issue?
> > 
> > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > least "Coffee Lake-S". Let me try to grab some random hardware around
> > and I'll be back with my observations.
> 
> Update some new findings from my side:
> 
> BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
> from 0xfd00 to 0xfe00.
> 
> On "Sapphire Rapids XCC":
> 
> 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
>correctly. 
>i.e.
>if (gfn >= 0xfd000 && gfn < 0xfe000) {
>   return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>}
>return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> 
> 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / 
> gdm
>restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
>correctly in this case).
> 
> 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
>this fb_map range as WC, with
>iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
> mem->size));
> 
>However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns 
> for
>this fb_map has been reserved as uc- by ioremap().
>Then, the ioremap_wc() during starting GDM will only map guest PAT with 
> UC-.
> 
>So, with KVM setting WB (no IPAT) to this fb_map range, the effective
>memory type is UC- and installer/gdm restarts endlessly.
> 
> 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs 
> driver
>to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
>(didn't verify the installer's case as I can't update the driver in that 
> case).
> 
>The reason is that the ioremap_wc() called during starting GDM will no 
> longer
>meet conflict and can map guest PAT as WC.
> 
> 
> WIP to find out why effective UC in fb_map range will make gdm to restart
> endlessly.
Not sure whether it's simply because UC is too slow.

T=Test execution time of a selftest in which guest writes to a GPA for
  0x100UL times

  | Sapphire Rapids XCC  | Coffee Lake-S
--|--|-
KVM UC+IPAT   |T=0m4.530s|  T=0m0.622s
--|--|-
KVM WC+IPAT   |T=0m0.149s|  T=0m0.176s
--|--|-
KVM WB+IPAT   |T=0m0.148s|  T=0m0.148s
--



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-04 Thread Vitaly Kuznetsov
Vitaly Kuznetsov  writes:

> Sean Christopherson  writes:
>
>> On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
>>> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
>>> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
>>> Silver 4410Y".
>>
>> Has this been reproduced on any other hardware besides SPR?  I.e. did we 
>> stumble
>> on another hardware issue?
>
> Very possible, as according to Yan Zhao this doesn't reproduce on at
> least "Coffee Lake-S". Let me try to grab some random hardware around
> and I'll be back with my observations.

We have some interesting results :-)

In addition to Sapphire Rapids, the issue also reproduces on a Cascade
lake CPU (Intel(R) Xeon(R) Silver 4214 CPU) but does NOT reproduce on
Skylake (Intel(R) Xeon(R) Gold 5118 CPU). I don't have a lot of desktop
CPUs around, so can't say much.

AMD also doesn't seem to be affected, at leats AMD EPYC 7413 works fine.

-- 
Vitaly




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-03 Thread Yan Zhao
On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
> > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> >> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> >> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> >> Silver 4410Y".
> >
> > Has this been reproduced on any other hardware besides SPR?  I.e. did we 
> > stumble
> > on another hardware issue?
> 
> Very possible, as according to Yan Zhao this doesn't reproduce on at
> least "Coffee Lake-S". Let me try to grab some random hardware around
> and I'll be back with my observations.

Update some new findings from my side:

BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
from 0xfd00 to 0xfe00.

On "Sapphire Rapids XCC":

1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
   correctly. 
   i.e.
   if (gfn >= 0xfd000 && gfn < 0xfe000) {
return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
   }
   return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
   restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
   correctly in this case).

3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
   this fb_map range as WC, with
   iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
mem->size));

   However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
   this fb_map has been reserved as uc- by ioremap().
   Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.

   So, with KVM setting WB (no IPAT) to this fb_map range, the effective
   memory type is UC- and installer/gdm restarts endlessly.

4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
   to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
   (didn't verify the installer's case as I can't update the driver in that 
case).

   The reason is that the ioremap_wc() called during starting GDM will no longer
   meet conflict and can map guest PAT as WC.


WIP to find out why effective UC in fb_map range will make gdm to restart
endlessly.



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-03 Thread Vitaly Kuznetsov
Sean Christopherson  writes:

> On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
>> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
>> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
>> Silver 4410Y".
>
> Has this been reproduced on any other hardware besides SPR?  I.e. did we 
> stumble
> on another hardware issue?

Very possible, as according to Yan Zhao this doesn't reproduce on at
least "Coffee Lake-S". Let me try to grab some random hardware around
and I'll be back with my observations.

-- 
Vitaly




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-03 Thread Sean Christopherson
On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> Silver 4410Y".

Has this been reproduced on any other hardware besides SPR?  I.e. did we stumble
on another hardware issue?



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-02 Thread Vitaly Kuznetsov
Yan Zhao  writes:

> On Fri, Aug 30, 2024 at 03:47:11PM +0200, Vitaly Kuznetsov wrote:
>> Gerd Hoffmann  writes:
>> 
>> >> Necroposting!
>> >> 
>> >> Turns out that this change broke "bochs-display" driver in QEMU even
>> >> when the guest is modern (don't ask me 'who the hell uses bochs for
>> >> modern guests', it was basically a configuration error :-). E.g:
>> >
>> > qemu stdvga (the default display device) is affected too.
>> >
>> 
>> So far, I was only able to verify that the issue has nothing to do with
>> OVMF and multi-vcpu, it reproduces very well with
>> 
>> $ qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name guest=c10s
>> -cpu host -smp 1 -m 16384 -drive 
>> file=/var/lib/libvirt/images/c10s-bios.qcow2,if=none,id=drive-ide0-0-0
>> -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
>> -vnc :0 -device VGA -monitor stdio --no-reboot
>> 
>> Comparing traces of working and broken cases, I couldn't find anything
>> suspicious but I may had missed something of course. For now, it seems
>> like a userspace misbehavior resulting in a segfault.
> Could you please share steps launch the broken guest desktop?
> (better also with guest kernel version, name of desktop processes,
>  name of X server)

I think the easiest would be to download the latest Centos Stream 10
iso, e.g:

https://composes.stream.centos.org/stream-10/development/CentOS-Stream-10-20240902.d.0/compose/BaseOS/x86_64/iso/CentOS-Stream-10-20240902.d.0-x86_64-dvd1.iso
(the link is probably not eternal but should work for a couple weeks,
check https://composes.stream.centos.org/stream-10/development/ it it
doesn't work anymore).

Then, just run it:
$ /usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name 
guest=c10s -cpu host -smp 1 -m 16384 -cdrom 
CentOS-Stream-10-20240902.d.0-x86_64-dvd1.iso -vnc :0 -device VGA -monitor 
stdio --no-reboot

and connect to VNC console. To speed things up, pick 'Install Centos
Stream 10' in the boot menu to avoid ISO integrity check.

With "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
commit included, you will see the following on the VNC console:
installer tries starting Wayland, crashes and drops back into text
console. If you revert the commit and start over, Wayland will normally
start and you will see the installer.

If the installer environment is inconvenient for debugging, then you can
install in text mode (or with the commit reverted :-) and then the same
problem will be observed when gdm starts.

FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
Silver 4410Y".

-- 
Vitaly




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-02 Thread Gerd Hoffmann
> > > Yes? :-) As Gerd described, video memory is "mapped into userspace so
> > > the wayland / X11 display server can software-render into the buffer"
> > > and it seems that wayland gets something unexpected in this memory and
> > > crashes. 
> > 
> > Also, I don't know if it helps or not, but out of two hunks in
> > 377b2f359d1f, it is the vmx_get_mt_mask() one which brings the
> > issue. I.e. the following is enough to fix things:
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index f18c2d8c7476..733a0c45d1a6 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7659,13 +7659,11 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
> > gfn, bool is_mmio)
> >  
> > /*
> >  * Force WB and ignore guest PAT if the VM does NOT have a 
> > non-coherent
> > -* device attached and the CPU doesn't support self-snoop.  Letting 
> > the
> > -* guest control memory types on Intel CPUs without self-snoop may
> > -* result in unexpected behavior, and so KVM's (historical) ABI is 
> > to
> > -* trust the guest to behave only as a last resort.
> > +* device attached.  Letting the guest control memory types on Intel
> > +* CPUs may result in unexpected behavior, and so KVM's ABI is to 
> > trust
> > +* the guest to behave only as a last resort.
> >  */
> > -   if (!static_cpu_has(X86_FEATURE_SELFSNOOP) &&
> > -   !kvm_arch_has_noncoherent_dma(vcpu->kvm))
> > +   if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> > return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | 
> > VMX_EPT_IPAT_BIT;
> >  
> > return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
> 
> Hmm, that suggests the guest kernel maps the buffer as WC.  And looking at the
> bochs driver, IIUC, the kernel mappings via ioremap() are UC-, not WC.  So it
> could be that userspace doesn't play nice with WC, but could it also be that 
> the
> QEMU backend doesn't play nice with WC (on Intel)?
> 
> Given that this is a purely synthetic device, is there any reason to use UC 
> or WC?

Well, sharing code with other (real hardware) drivers is pretty much the
only reason.  DRM has a set of helper functions to manage vram in pci
memory bars (see drm_gem_vram_helper.c, drm_gem_ttm_helper.c).

> I.e. can the bochs driver configure its VRAM buffers to be WB?  It doesn't 
> look
> super easy (the DRM/TTM code has so. many. layers), but it appears doable.  
> Since
> the device only exists in VMs, it's possible the bochs driver has never run on
> Intel CPUs with WC memtype.

Thomas Zimmermann  (Cc'ed) has a drm patch series
in flight which switches the bochs driver to a shadow buffer model, i.e.
all the buffers visible to fbcon and userspace live in main memory.
Display updates are handled via in-kernel memcpy from shadow to vram.
The pci memory bar becomes an bochs driver implementation detail not
visible outside the driver.  This should give the bochs driver the
freedom to map vram with whatever attributes work best with kvm, without
needing drm changes outside the driver.

Of course all this does not help much with current distro kernels broken
by this patch ...

take care,
  Gerd




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-01 Thread Yan Zhao
On Fri, Aug 30, 2024 at 03:47:11PM +0200, Vitaly Kuznetsov wrote:
> Gerd Hoffmann  writes:
> 
> >> Necroposting!
> >> 
> >> Turns out that this change broke "bochs-display" driver in QEMU even
> >> when the guest is modern (don't ask me 'who the hell uses bochs for
> >> modern guests', it was basically a configuration error :-). E.g:
> >
> > qemu stdvga (the default display device) is affected too.
> >
> 
> So far, I was only able to verify that the issue has nothing to do with
> OVMF and multi-vcpu, it reproduces very well with
> 
> $ qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name guest=c10s
> -cpu host -smp 1 -m 16384 -drive 
> file=/var/lib/libvirt/images/c10s-bios.qcow2,if=none,id=drive-ide0-0-0
> -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> -vnc :0 -device VGA -monitor stdio --no-reboot
> 
> Comparing traces of working and broken cases, I couldn't find anything
> suspicious but I may had missed something of course. For now, it seems
> like a userspace misbehavior resulting in a segfault.
Could you please share steps launch the broken guest desktop?
(better also with guest kernel version, name of desktop processes,
 name of X server)

Currently, I couldn't reproduce the error with "-device bochs-display" or
"-device VGA" locally on a "Coffee Lake-S" test machine. 

Qemu cmd as below:
qemu-system-x86_64 -m 4096 -smp 1 -M q35 -name guest-01
-hda ubuntu22-1.qcow2 -bios /usr/bin/bios.bin -enable-kvm -k en-us
-serial stdio -device bochs-display -machine kernel_irqchip=on
-cpu host -usb -usbdevice tablet

The guest can see a VGA device
00:02.0 Display controller: Device 1234: (rev 02)
with driver
# readlink /sys/bus/pci/devices/\:00\:02.0/driver
../../../bus/pci/drivers/bochs-drm

I have tried hardcoding several fields as below:

(1)  hardcoded the fb_map to wc in the guest driver

--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
if (pci_request_region(pdev, 0, "bochs-drm") != 0)
DRM_WARN("Cannot request framebuffer, boot fb still active?\n");

-   bochs->fb_map = ioremap(addr, size);
+   bochs->fb_map = ioremap_wc(addr, size);
+   printk("bochs wc fb_map=%lx, addr=%lx, size=%lx\n", (unsigned 
long)bochs->fb_map, (unsigned long)addr, (unsigned long)size);
if (bochs->fb_map == NULL) {
DRM_ERROR("Cannot map framebuffer\n");
return -ENOMEM;

With dmesg as below:

[7.565840] ioremap wc phys_addr fd00 size 100 to wc
[7.565856] bochs wc fb_map=c9000400, addr=fd00, size=100
[7.565859] [drm] Found bochs VGA, ID 0xb0c5.
[7.565861] [drm] Framebuffer size 16384 kB @ 0xfd00, mmio @ 0xfebd9000.
[7.591995] [drm] Found EDID data blob.
[7.603956] [drm] Initialized bochs-drm 1.0.0 20130925 for :00:02.0 on 
minor 1
[7.614263] bochs-drm :00:02.0: [drm] fb1: bochs-drmdrmfb frame buffer 
device

(2) hardcoded the memory type to WC in KVM intel driver.
+   if (gfn >= 0xfd000 && gfn < 0xfe000)
+   return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | 
VMX_EPT_IPAT_BIT;


(3) hardcoded mmap flags to WC for some bo objects for Xorg.