Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer

2022-09-07 Thread marma...@invisiblethingslab.com
On Wed, Sep 07, 2022 at 12:21:12PM +, Dario Faggioli wrote:
> On Tue, 2022-09-06 at 14:35 +0200, Marek Marczykowski-Górecki wrote:
> > On Tue, Sep 06, 2022 at 01:46:55PM +0200, Juergen Gross wrote:
> > > 
> > > Could you test the attached patch, please?
> > 
> > I did a test with only dom0 running, and it works now. It isn't a
> > comprehensive test, but just dom0 was enough to crash it before, and
> > it
> > stays working now.
> >
> That's very cool to hear! Thanks for testing and reporting back.
> 
> Just to be sure, did you check both Credit1 and Credit2 and do they
> both work, with Juergen's patch?

The test above was with credit1. I did checked credit2 later, and it
still crashes, unfortunately (Juergen knows already from our IRC chat).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: SecureBoot and PCI passthrough with kernel lockdown in place (on Xen)

2022-02-14 Thread marma...@invisiblethingslab.com
On Mon, Feb 14, 2022 at 03:25:31PM +, Andrew Cooper wrote:
> On 14/02/2022 15:02, Dario Faggioli wrote:
> > Hello,
> >
> > We have run into an issue when trying to use PCI passthrough for a Xen
> > VM running on an host where dom0 kernel is 5.14.21 (but we think it
> > could be any kernel > 5.4) and SecureBoot is enabled.
> 
> Back up a bit...
> 
> Xen doesn't support SecureBoot and there's a massive pile of work to
> make it function, let alone work in a way that MSFT aren't liable to
> revoke your cert on 0 notice.
> 
> >
> > The error we get, when (for instance) trying to attach a device to an
> > (HVM) VM, on such system is:
> >
> > # xl pci-attach 2-fv-sles15sp4beta2 :58:03.0 
> > libxl: error: libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 
> > 12:Failed to initialize 12/15, type = 0x1, rc: -1
> > libxl: error: libxl_pci.c:1777:device_pci_add_done: Domain 
> > 12:libxl__device_pci_add failed for PCI device 0:58:3.0 (rc -28)
> > libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add 
> > device
> >
> > QEMU, is telling us the following:
> >
> > [00:04.0] xen_pt_msix_init: Error: Can't open /dev/mem: Operation not 
> > permitted
> > [00:04.0] xen_pt_msix_size_init: Error: Internal error: Invalid 
> > xen_pt_msix_init.
> >
> > And the kernel reports this:
> >
> > Jan 27 16:20:53 narvi-sr860v2-bps-sles15sp4b2 kernel: Lockdown: 
> > qemu-system-i38: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
> >
> > So, it's related to lockdown. Which AFAIUI it's consistent with the
> > fact that the problem only shows up when SecureBoot is enabled, as
> > that's implies lockdown. It's also consistent with the fact that we
> > don't seem to have any problems doing the same with a 5.3.x dom0
> > kernel... As there's no lockdown there!
> >
> > Some digging revealed that QEMU tries to open /dev/mem in
> > xen_pt_msix_init():
> >
> > fd = open("/dev/mem", O_RDWR);
> > ...
> > msix->phys_iomem_base =
> > mmap(NULL,
> >  total_entries * PCI_MSIX_ENTRY_SIZE + 
> > msix->table_offset_adjust,
> >  PROT_READ,
> >  MAP_SHARED | MAP_LOCKED,
> >  fd,
> >  msix->table_base + table_off - msix->table_offset_adjust);
> > close(fd);
> 
> Yes.  Use of /dev/mem is not permitted in lockdown mode.  This wants
> reworking into something which is lockdown compatible.

FWIW, Qubes has PCI passthrough working with qemu in stubdomain, which
works without access to /dev/mem in dom0. We do this, by disabling
MSI-X, including the above piece of code...

https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/master/qemu/patches/0005-Disable-MSI-X-caps.patch

> The real elephant in the room is that privcmd is not remotely safe to
> use in a SecureBoot environment, because it lets any root userspace
> trivially escalate privilege into the dom0 kernel, bypassing the
> specific protection that SecureBoot is trying to achieve.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: Recent upgrade of 4.13 -> 4.14 issue

2020-10-30 Thread marma...@invisiblethingslab.com
On Sat, Oct 31, 2020 at 04:27:58AM +0100, Dario Faggioli wrote:
> On Sat, 2020-10-31 at 03:54 +0100, marma...@invisiblethingslab.com
> wrote:
> > On Sat, Oct 31, 2020 at 02:34:32AM +, Dario Faggioli wrote:
> > (XEN) *** Dumping CPU7 host state: ***
> > (XEN) Xen call trace:
> > (XEN)    [] R _spin_lock+0x35/0x40
> > (XEN)    [] S on_selected_cpus+0x1d/0xc0
> > (XEN)    [] S vmx_do_resume+0xba/0x1b0
> > (XEN)    [] S context_switch+0x110/0xa60
> > (XEN)    [] S core.c#schedule+0x1aa/0x250
> > (XEN)    [] S softirq.c#__do_softirq+0x5a/0xa0
> > (XEN)    [] S vmx_asm_do_vmentry+0x2b/0x30
> > 
> > And so on, for (almost?) all CPUs.
>
> Right. So, it seems like a live (I would say) lock. It might happen on
> some resource which his shared among domains. And introduced (the
> livelock, not the resource or the sharing) in 4.14.
> 
> Just giving a quick look, I see that vmx_do_resume() calls
> vmx_clear_vmcs() which calls on_selected_cpus() which takes the
> call_lock spinlock.
> 
> And none of these seems to have received much attention recently.
> 
> But this is just a really basic analysis!

I've looked at on_selected_cpus() and my understanding is this:
1. take call_lock spinlock
2. set function+args+what cpus to be called in a global "call_data" variable
3. ask CPUs to execute that function (smp_send_call_function_mask() call)
4. wait for all requested CPUs to execute the function, still holding
the spinlock
5. only then - release the spinlock

So, if any CPU does not execute requested function for any reason, it
will keep the call_lock locked forever.

I don't see any CPU waiting on step 4, but also I don't see call traces
from CPU3 and CPU8 in the log - that's because they are in guest (dom0
here) context, right? I do see "guest state" dumps from them.
The only three CPUs that do logged xen call traces and are not waiting on that
spin lock are:

CPU0:
(XEN) Xen call trace:
(XEN)[] R vcpu_unblock+0x9/0x50
(XEN)[] S vcpu_kick+0x11/0x60
(XEN)[] S tasklet.c#do_tasklet_work+0x68/0xc0
(XEN)[] S tasklet.c#tasklet_softirq_action+0x39/0x60
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S vmx_asm_do_vmentry+0x2b/0x30

CPU4:
(XEN) Xen call trace:
(XEN)[] R set_timer+0x133/0x220
(XEN)[] S credit.c#csched_tick+0/0x3a0
(XEN)[] S timer.c#timer_softirq_action+0x9f/0x300
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S x86_64/entry.S#process_softirqs+0x6/0x20

CPU14:
(XEN) Xen call trace:
(XEN)[] R do_softirq+0/0x10
(XEN)[] S x86_64/entry.S#process_softirqs+0x6/0x20

I'm not sure if any of those is related to that spin lock,
on_selected_cpus() call, or anything like that...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature


Re: Recent upgrade of 4.13 -> 4.14 issue

2020-10-30 Thread marma...@invisiblethingslab.com
On Sat, Oct 31, 2020 at 02:34:32AM +, Dario Faggioli wrote:
> On Tue, 2020-10-27 at 17:06 +0100, Frédéric Pierret wrote:
> > 
> > Ok the server got frozen just few minutes after my mail and I got
> > now:
> > 'r': https://gist.github.com/fepitre/78541f555902275d906d627de2420571
> >
> From the scheduler point of view, things seems fine:
> 
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=770188952085
> (XEN) Online Cpus: 0-15
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-15
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) Active queues: 2
> (XEN) default-weight = 256
> (XEN) Runqueue 0:
> (XEN) ncpus  = 8
> (XEN) cpus   = 0-7
> (XEN) max_weight = 256
> (XEN) pick_bias  = 1
> (XEN) instload   = 7
> (XEN) aveload= 2021119 (~770%)
> (XEN) idlers: ,
> (XEN) tickled: ,
> (XEN) fully idle cores: ,
> (XEN) Runqueue 1:
> (XEN) ncpus  = 8
> (XEN) cpus   = 8-15
> (XEN) max_weight = 256
> (XEN) pick_bias  = 9
> (XEN) instload   = 8
> (XEN) aveload= 2097259 (~800%)
> (XEN) idlers: ,
> (XEN) tickled: ,0200
> (XEN) fully idle cores: ,
> 
> The system is pretty busy, but not in overload.
> 
> Below we see that CPU 3 is running the idle vCPU, but it's not marked
> as neither idle nor tickled.
> 
> It may be running a tasklet (the one that dumps the debug key output, I
> guess).
> 
> Credits are fine, I don't see any strange values that may indicate
> anomalies or something.
> 
> All the CPUs are executing a vCPU, and there should be nothing that
> prevent them making progresses.
> 
> There is one vCPU which apparetnly want to run at 100% in pretty much
> all guests, and more than one in dom0.
> 
> And I think I saw some spin_lock() in the call stacks, in the partial
> report of '*' debug-key?

Yes, I see:

(XEN) *** Dumping CPU1 host state: ***
(...)
(XEN) Xen call trace:
(XEN)[] R _spin_lock+0x35/0x40
(XEN)[] S on_selected_cpus+0x1d/0xc0
(XEN)[] S vmx_do_resume+0xba/0x1b0
(XEN)[] S context_switch+0x110/0xa60
(XEN)[] S core.c#schedule+0x1aa/0x250
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S x86_64/entry.S#process_softirqs+0x6/0x20
(...)
(XEN) *** Dumping CPU2 host state: ***
(XEN) Xen call trace:
(XEN)[] R _spin_lock+0x32/0x40
(XEN)[] S on_selected_cpus+0x1d/0xc0
(XEN)[] S vmx_do_resume+0xba/0x1b0
(XEN)[] S context_switch+0x110/0xa60
(XEN)[] S core.c#schedule+0x1aa/0x250
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S vmx_asm_do_vmentry+0x2b/0x30

(XEN) *** Dumping CPU5 host state: ***
(XEN) Xen call trace:
(XEN)[] R _spin_lock+0x32/0x40
(XEN)[] S on_selected_cpus+0x1d/0xc0
(XEN)[] S vmx_do_resume+0xba/0x1b0
(XEN)[] S context_switch+0x110/0xa60
(XEN)[] S core.c#schedule+0x1aa/0x250
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S vmx_asm_do_vmentry+0x2b/0x30

(XEN) *** Dumping CPU6 host state: ***
(XEN) Xen call trace:
(XEN)[] R _spin_lock+0x35/0x40
(XEN)[] S on_selected_cpus+0x1d/0xc0
(XEN)[] S vmx_do_resume+0xba/0x1b0
(XEN)[] S context_switch+0x110/0xa60
(XEN)[] S core.c#schedule+0x1aa/0x250
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S x86_64/entry.S#process_softirqs+0x6/0x20

(XEN) *** Dumping CPU7 host state: ***
(XEN) Xen call trace:
(XEN)[] R _spin_lock+0x35/0x40
(XEN)[] S on_selected_cpus+0x1d/0xc0
(XEN)[] S vmx_do_resume+0xba/0x1b0
(XEN)[] S context_switch+0x110/0xa60
(XEN)[] S core.c#schedule+0x1aa/0x250
(XEN)[] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)[] S vmx_asm_do_vmentry+0x2b/0x30

And so on, for (almost?) all CPUs.

Note the '*' output is (I think) from a different instances of the
freeze, so cannot be correlated with other outputs...

> Maybe they're stuck in the kernel, not in Xen? Thoughs ?

Given the above spin locks, I don't think so. But also, even if they are
stuck in the kernel, it clearly happened after 4.13 -> 4.14 upgrade...

> (XEN) Domain info:
> (XEN) Domain: 0 w 256 c 0 v 16
> (XEN)   1: [0.0] flags=0 cpu=5 credit=10553147 [w=256] load=17122
> (~6%)
> (XEN)   2: [0.1] flags=0 cpu=4 credit=10570606 [w=256] load=13569
> (~5%)
> (XEN)   3: [0.2] flags=0 cpu=7 credit=10605188 [w=256] load=13465
> (~5%)
> (XEN)   4: [0.3] flags=2 cpu=11 credit=9998469 [w=256] load=262144
> (~100%)
> (XEN)   5: [0.4] flags=0 cpu=0 credit=10533686 [w=256] load=13619
> (~5%)
> (XEN)   6: [0.5] flags=a cpu=9 credit=1101 [w=256] load=0 (~0%)
> (XEN)   7: [0.6] flags=2 cpu=2 credit=10621802 [w=256] load=13526
> (~5%)
> (XEN)   8: [0.7] flags=2 cpu=1 

Re: [Xen-devel] [PATCH v4 3/3] xen/efi: use directmap to access runtime services table

2019-10-24 Thread marma...@invisiblethingslab.com
On Thu, Oct 24, 2019 at 01:11:22PM +, Xia, Hongyan wrote:
> It is certainly nice to have less users of the direct map. My non-EFI
> builds already work without the direct map now but once I start testing
> EFI, it is nice to have one less thing to worry about.

Note this is just yet another EFI info that's included there. Others
are: efi_ct, efi_memmap, efi_fw_vendor. So, if you'd like to get rid of
directmap, you'll need to handle the others too in some way. Doing that
for 3 or 4 tables shouldn't make significant difference.

> How important and performance-critical is this? If we really want to
> avoid switching the page table, we could reserve a virtual range and
> map it to runtime services in Xen.

Honestly I don't think that's very critical. The biggest improvement is
for XEN_FW_EFI_RT_VERSION, where you avoid switching page tables at all.
In other cases, you avoid that for too old UEFIs only. Anyway, I think
none of it is on a hot path.
This is an optimization suggested by Jan, which is nice to have, but
definitely isn't the only possible option.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Criteria / validation proposal: drop Xen

2019-07-11 Thread marma...@invisiblethingslab.com
On Thu, Jul 11, 2019 at 10:58:03AM -0700, Adam Williamson wrote:
> On Thu, 2019-07-11 at 09:57 -0500, Doug Goldstein wrote:
> > On 7/8/19 11:11 AM, Adam Williamson wrote:
> > > On Tue, 2019-05-21 at 11:14 -0700, Adam Williamson wrote:
> > > > > > > > "The release must boot successfully as Xen DomU with releases 
> > > > > > > > providing
> > > > > > > > a functional, supported Xen Dom0 and widely used cloud providers
> > > > > > > > utilizing Xen."
> > > > > > > > 
> > > > > > > > and change the 'milestone' for the test case -
> > > > > > > > https://fedoraproject.org/wiki/QA:Testcase_Boot_Methods_Xen_Para_Virt
> > > > > > > >  -
> > > > > > > > from Final to Optional.
> > > > > > > > 
> > > > > > > > Thoughts? Comments? Thanks!
> > > > > > > I would prefer for it to remain as it is.
> > > > > > This is only practical if it's going to be tested, and tested 
> > > > > > regularly
> > > > > > - not *only* on the final release candidate, right before we sign 
> > > > > > off
> > > > > > on the release. It needs to be tested regularly throughout the 
> > > > > > release
> > > > > > cycle, on the composes that are "nominated for testing".
> > > > > Would the proposal above work for you? I think it satisfies what you 
> > > > > are
> > > > > looking for. We would also have someone who monitors these test 
> > > > > results
> > > > > pro-actively.
> > > > In theory, yeah, but given the history here I'm somewhat sceptical. I'd
> > > > also say we still haven't really got a convincing case for why we
> > > > should continue to block the release (at least in theory) on Fedora
> > > > working in Xen when we don't block on any other virt stack apart from
> > > > our 'official' one, and we don't block on all sorts of other stuff we'd
> > > > "like to have working" either. Regardless of the testing issues, I'd
> > > > like to see that too if we're going to keep blocking on Xen...
> > > So, this died here. As things stand: I proposed removing the Xen
> > > criterion, Lars opposed, we discussed the testing situation a bit, and
> > > I said overall I'm still inclined to remove the criterion because
> > > there's no clear justification for it for Fedora any more. Xen working
> > > (or rather, Fedora working on Xen) is just not a key requirement for
> > > Fedora at present, AFAICS.
> > > 
> > > It's worth noting that at least part of the justification for the
> > > criterion in the first place was that Amazon was using Xen for EC2, but
> > > that is no longer the case, most if not all EC2 instance types no
> > > longer use Xen. Another consideration is that there was a time when KVM
> > > was still pretty new stuff and VirtualBox was not as popular as it is
> > > now, and Xen was still widely used for general hobbyist virtualization
> > > purposes; I don't believe that's really the case any more.
> > 
> > So I'll just point out this is false. Amazon very much uses Xen still 
> > and is investing in Xen still. In fact I'm writing this email from the 
> > XenSummit where Amazon is currently discussing their future development 
> > efforts for the Xen Project.
> 
> Sorry about that, it was just based on my best efforts at trying to
> figure it out; Amazon instance types don't all explicitly state exactly
> how they work.
> 
> Which EC2 instance types still use Xen?

I don't know what new instance types use Xen, but definitely there are
existing previous instance generations that are still running and not
going away anytime soon. From what I understand, they are still great
majority of EC2.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel