On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>>> able to be mapped into a hvm domU.
>>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>>
>>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>>> register gsi when dom0 is PVH.
>>>>>>>>
>>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiqian Chen <jiqian.c...@amd.com>
>>>>>>>> Signed-off-by: Huang Rui <ray.hu...@amd.com>
>>>>>>>> Signed-off-by: Jiqian Chen <jiqian.c...@amd.com>
>>>>>>>> ---
>>>>>>>> The code link that will call this hypercall on linux kernel side is as 
>>>>>>>> follows:
>>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>>>>>
>>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>>>>>> unaddressed
>>>>>>> one here:
>>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>>  operation, I think it'll also want/need explaining why what is 
>>>>>>> sufficient for
>>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>>> I have modified the commit message to describe why GSIs are not 
>>>>>> registered can cause passthrough not work, according to this v9 comment.
>>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin 
>>>>>> list, and the handler of irq_desc is not set, then when passthrough a 
>>>>>> device, setting ioapic affinity and vector will fail."
>>>>>> What description do you want me to add?
>>>>>
>>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>>> the description one way or another): How come Dom0 alone gets away fine
>>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>>> perhaps that it just so happened that for Dom0 things have been working
>>>>> on systems where it was tested, but the call should in principle have been
>>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>>> difference for both this patch's description and us accepting it.
>>>> Oh, I think I know what's your concern now. Thanks.
>>>> First question, why gsi of device can work on PVH dom0:
>>>> Because when probe a driver to a normal device, it will call linux kernel 
>>>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
>>>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
>>>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
>>>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>>> Because when assign a device to be passthrough, it uses pciback to probe 
>>>> the device, and it calls pcistub_probe, but in all callstack of 
>>>> pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the 
>>>> function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when 
>>>> the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>>> intercepts.
>> Because isr_on is not set, when xen_pcibk_control_isr is called, it will 
>> return due to " !dev_data->isr_on". So that fake IRQ handler aren't 
>> installed.
> 
> I'm afraid I don't follow you here. Quoting from the function:
> 
>       enable =  dev_data->enable_intx;
> 
>       /* Asked to disable, but ISR isn't runnig */
>       if (!enable && !dev_data->isr_on)
>               return;
> 
> I.e. we bail if the request was to _disable_ and there is no ISR.
I mean, after debugging the pcistub_probe callstack:
pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> 
xen_pcibk_control_isr(dev, 1 /*reset device*/)
and in xen_pcibk_control_isr code:
        if (reset) {
                dev_data->enable_intx = 0;
                dev_data->ack_intr = 0;
        }
        enable =  dev_data->enable_intx;

        /* Asked to disable, but ISR isn't runnig */
        if (!enable && !dev_data->isr_on)
                return;
"reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then 
arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, 
so it return here.
It can't reach following code to install irq handle.

> 
> I can't exclude though that command_write()'s logic to set ->enable_intx
> is insufficient. But in the common case one would surely expect at least
> one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
> IOW at some point I'd expect xen_pcibk_control_isr() to be called with
> the second argument 0 and with ->enable_intx set.
I only see xen_pcibk_control_isr(dev, 0) is called in xen_pcibk_do_one_op, but 
xen_pcibk_do_one_op is not called during assigning a device to be passthrough.

> 
>> And it seems isr_on is set through driver sysfs " irq_handler_state" for a 
>> level device that is to be shared with guest and the IRQ is shared with the 
>> initial domain.
> 
> The sysfs interface is, according to my reading of the description
> of the commit introducing it, merely for debugging / recovery purposes.
> (It also looks to me as if this was partly broken: If one would use it,
> thus clearing ->isr_on, a subsequent disable request would take exactly
> that early bailing path quoted above, with nothing removing the IRQ
> handler.)
> 
> That description also talks about both an edge vs level distinction in
> behavior and one for shared vs non-shared, but neither in that commit
> nor in present code I can spot any respective checks. Otherwise I could
> understand that there are cases where the necessary information isn't
> propagated to Xen.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Reply via email to