On Fri, Oct 15, 2021 at 2:48 PM Parav Pandit <pa...@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasow...@redhat.com>
> > Sent: Friday, October 15, 2021 12:12 PM
> >
> > On Fri, Oct 15, 2021 at 1:20 PM Parav Pandit <pa...@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasow...@redhat.com>
> > > > Sent: Friday, October 15, 2021 10:46 AM
> > > >
> > > >
> > > > 在 2021/10/15 下午12:36, Parav Pandit 写道:
> > > > >
> > > > >> From: Michael S. Tsirkin <m...@redhat.com>
> > > > >> Sent: Friday, October 15, 2021 3:59 AM
> > > > >>
> > > > >> On Thu, Oct 14, 2021 at 05:35:37PM +0000, Parav Pandit wrote:
> > > > >>> Hi Michael, Cornelia,
> > > > >>>
> > > > >>>> From: Parav Pandit
> > > > >>>> Sent: Tuesday, October 12, 2021 2:42 PM
> > > > >>>>
> > > > >>>>> From: Michael S. Tsirkin <m...@redhat.com>
> > > > >>>>> Sent: Tuesday, October 12, 2021 2:32 PM
> > > > >>>>>
> > > > >>>>> On Tue, Oct 12, 2021 at 08:51:34AM +0000, Parav Pandit wrote:
> > > > >>>>>>
> > > > >>>>>>> From: Michael S. Tsirkin <m...@redhat.com>
> > > > >>>>>>> Sent: Monday, October 11, 2021 9:30 PM
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Oct 11, 2021 at 03:44:14PM +0000, Parav Pandit wrote:
> > > > >>>>>>>>>> This is unlikely to work the reset is completed. Because
> > > > >>>>>>>>>> a real device
> > > > >>>>>>>>> implementing this would prefer to do this in fw for 1000
> > > > >>>>>>>>> virtio devices sitting on the physical card.
> > > > >>>>>>>>>> And it is very much driven by such implementation at
> > > > >>>>>>>>>> device
> > > > >> devel.
> > > > >>>>>>>>>> So it cannot update the counter value if reset is not
> > > > >>>>>>>>>> completed for the
> > > > >>>>>>> device.
> > > > >>>>>>>>>> I think read only device reset timeout is most elegant
> > > > >>>>>>>>>> option during device
> > > > >>>>>>>>> initialization phase that eliminates infinite loop of today.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Why can't a driver just go ahead and do a timeout regardless?
> > > > >>>>>>>> o.k. lets consider this thought exercise. What is the
> > > > >>>>>>>> timeout value that driver
> > > > >>>>>>> will choose if device doesn't specify one?
> > > > >>>>>>>> I explained in previous thread and you acked that actual fw
> > > > >>>>>>>> based device
> > > > >>>>>>> may take longer to initialize than pure sw implementation
> > backend.
> > > > >>>>>>>> In second example a pre-boot device can take even longer
> > > > >>>>>>>> initialization
> > > > >>>>> time.
> > > > >>>>>>>> Sriov VF device may initialize lot faster.
> > > > >>>>>>>> Instead of driver having such transport, and device
> > > > >>>>>>>> specific checks, (or some
> > > > >>>>>>> very short or very long timeout), we propose, that let
> > > > >>>>>>> device mention such timeout value.
> > > > >>>>>>>
> > > > >>>>>>> Parav I think you are conflating reset with initialization time.
> > > > >>>>>>> initialization is just for host boot which takes seconds
> > > > >>>>>>> anyway
> > > > >>>>>>> - but no, minutes is not reasonable their, either.
> > > > >>>>>>> reset affects guest boot. This needs to complete in 
> > > > >>>>>>> milliseconds.
> > > > >>>>>>>
> > > > >>>>>> I cannot promise, but with newer generation devices usually
> > > > >>>>>> functionality
> > > > >>>>> improves.
> > > > >>>>>> Enforcing in milliseconds doesn't look practical for type of 
> > > > >>>>>> devices.
> > > > >>>>>> Some of the block devices may need to establish TCP
> > > > >>>>>> connections in the
> > > > >>>>> backend.
> > > > >>>>>> It is more useful to wait for few more seconds to initialize
> > > > >>>>>> device after power
> > > > >>>>> on the system, instead of giving up booting the server completely.
> > > > >>>>>> For example, a nvme block device starts with a minimum
> > > > >>>>>> timeout of
> > > > >>>>> 500msec.
> > > > >>>>>> Yes, I agree to your point that a device given to a guest VM
> > > > >>>>>> will likely have
> > > > >>>>> very short reset time that should complete in milliseconds.
> > > > >>>>>>> This conflation is IMHO one of the problems with this proposal.
> > > > >>>>>> Device initialization consist of device reset from the spec 
> > > > >>>>>> section
> > 3.1.1.
> > > > >>>>> It does. But maybe we need to create a way for driver to
> > > > >>>>> distinguish between the two. When under reset, use a driver
> > > > >>>>> supplied
> > > > >> timeout.
> > > > >>>> This make sense, because as we discussed when device undergo a
> > > > >>>> reset with active DMA, after timeout expires, driver still cannot
> > cleanup.
> > > > >>>> So this can be short driver decided value as longer timeout is not
> > useful.
> > > > >>>>
> > > > >>>>> When powering up, use a longer device supplied one.
> > > > >>>> In v0, v1 I initially considered only the powering up case of
> > > > >>>> the device initialization. There was text around that.
> > > > >>>> And v2 I removed the initialization text, and I totally missed
> > > > >>>> the above case with active DMA.
> > > > >>>> This should work.
> > > > >>>> We should word this part of the spec accordingly.
> > > > >>> Below changes are good for v3?
> > > > >>> 1. driver should use device reset time during initialization
> > > > >>> stage
> > > > >> How does driver identify this though?
> > > > > Existence of device_reset_timeout field in struct
> > > > > virtio_pci_common_cfg
> > > > indicates that this field exists.
> > > > > If device support it, it will place non zero value and driver
> > > > > knows that this
> > > > field should be used.
> > > > >
> > > > >>> 2. remove feature bit as feature bits are only readable after
> > > > >>> reset is completed 3. device reset timeout field of zero
> > > > >>> indicates that device doesn't
> > > > >> support it.
> > > > >>
> > > > >> I'm not sure about 3. I think each transport will need its own way 
> > > > >> to do it.
> > > > >>
> > > > > For pci a value of zero indicates it isn't supported.
> > > > > For mmio DeviceResetTimeout at offset 0x04c indicates same.
> > > > > Currently only these 2 transports have the use.
> > > > >
> > > > >> So I propose: maybe a capability like this, with a timeout field?
> > > > > Do you mean a new capability like say VIRTIO_PCI_DEVICE_TIMEOUT
> > > > > like
> > > > VIRTIO_PCI_CAP_COMMON_CFG?
> > > > > This will contain one or more timeout? For example with his
> > > > > proposal it
> > > > contains only device reset timeout.
> > > > > Later same capability will be further extended to contains command
> > > > > timeout
> > > > too? Yes?
> > > > >
> > > > >> And within VMs, we can just do without, since it got out of reset
> > > > >> once it will surely get out of reset again...
> > > > > Yes, VM might not need it. It is really the HV's choice to
> > > > > implement and not
> > > > part of the virtio spec.
> > > >
> > > >
> > > > Well, this will break the migration between HW virtio and SW virtio.
> > > >
> > > How does it break? Can you please explain?
> > > SW virtio will emulate what HW virtio does. This field is exposing only 
> > > read-
> > only field to driver to not wait infinitely.
> >
> > As discussed previously, can we use transport level reset in this case?
> >
> Do you have pointer to it? I do not understand transport level reset.

https://lore.kernel.org/virtualization/7ebb9ba0-69a0-2279-9b9e-60c50db06...@redhat.com/

> If you are asking can PCI reset can be used to reset the device? Yes.

Yes, I meant PCI reset.

> But here what we are talking about is, when virtio layer issue the device 
> reset, how long should it wait for that reset to complete.

A question is that what do we expect the driver to do if there's a
timeout on the device reset?

> Typical example is virtio driver loading on physical virtio device agnostic 
> of the transport and waiting for the device to come out of reset.

I'm not sure how hard we can meet a device/transport agnostic timeout.
E.g how can we know the timeout satisfy for the requirement of all the
transport?

>
> > > If src is hw and dst is sw, sw will likely have similar capabilities as 
> > > hw, not just
> > this particular one but many other. Isn't it?
> >
> > The problem is the when src is software backend without this capability.
>
> It isn’t any different than any other RW field of the device.
> For example sw did virtio pci device emulation with 30 msix vectors and hw 
> has only 2.

In the case of MSI-X, the migration can't be done directly from src to
dest. The only choice is some kind of software mediation.

In the case of timeout, the mediation won't even help, since we don't
even present the timeout to guest. Even if the hypervisor can see the
timeout interface.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Reply via email to