Hi all,

It's been a while since the last discussion here. I have been working on
implementing the standby feature in Qemu. I have tried multiple approaches
for implementation and in the end decided to implement using the
hotplug/unplug infrastructure for multiple reasons which I'll go over when
I send the patches. For now you can find the implementation here:
https://github.com/sameehj/qemu/tree/failover_hidden_opts (the full command
line I used can be found at the end of the email)

I have tested my implementation in Qemu with Fedora 29 guest, I can see the
failover interface successfully and assign an ip to it. The feature is
acked and the primary device is plugged in with no issues.

I have created a setup which has two hosts (host A and host B) with X710
10G cards connected back to back. On one host (I'll refer to this host as
host A) I have configured a bridge with the PF interface as well as
vitio-net's interface (standby) both attached to it. I ran the guest with
the patched Qemu on host A and pinged the bridge successfully, I also have
a ping between host A and Host B, however, I can't ping host B from the VM
and vice versa, this only happens when the feature is enabled for some
reason I have yet to figure out.

I haven't tested migration yet, but on my way to do so.

Since I couldn't ping from VM to host B, I did an iperf test between the VM
and host A with the feature enabled and during the test I have unplugged
the sriov device, the device was unplugged successfully and no drops where
observed as you can see in the results below:

[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.156.44  netmask 255.255.248.0  broadcast 10.19.159.255
        inet6 fe80::d306:561f:9f43:ff77  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:1398:9699:325b:25f9:e7bb  prefixlen 64  scopeid
0x0<global>
        ether 56:cc:c1:01:cc:21  txqueuelen 1000  (Ethernet)
        RX packets 12258  bytes 870822 (850.4 KiB)
        RX errors 11  dropped 0  overruns 0  frame 11
        TX packets 294  bytes 32432 (31.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.17  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::bc87:86b8:bc86:be4e  prefixlen 64  scopeid 0x20<link>
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 41052  bytes 2775833 (2.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 47468  bytes 15629 (15.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 214  bytes 14966 (14.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 163  bytes 26498 (25.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 41052  bytes 2775833 (2.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 47468  bytes 2889827541 (2.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 176  bytes 19712 (19.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 176  bytes 19712 (19.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@dhcp156-44 ~]# iperf -c 192.168.1.117 -t 100 -i 1
------------------------------------------------------------
Client connecting to 192.168.1.117, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.17 port 40368 connected with 192.168.1.117 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  3.47 GBytes  29.8 Gbits/sec
[  3]  1.0- 2.0 sec  4.35 GBytes  37.4 Gbits/sec
[  3]  2.0- 3.0 sec  4.10 GBytes  35.2 Gbits/sec
[  3]  3.0- 4.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3]  4.0- 5.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3]  5.0- 6.0 sec  4.07 GBytes  34.9 Gbits/sec
[  3]  6.0- 7.0 sec  4.53 GBytes  38.9 Gbits/sec
[  3]  7.0- 8.0 sec  4.38 GBytes  37.6 Gbits/sec
[  3]  8.0- 9.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3]  9.0-10.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3] 10.0-11.0 sec  4.56 GBytes  39.2 Gbits/sec
[  3] 11.0-12.0 sec  4.70 GBytes  40.4 Gbits/sec
[  3] 12.0-13.0 sec  4.65 GBytes  39.9 Gbits/sec
[  3] 13.0-14.0 sec  4.51 GBytes  38.7 Gbits/sec
[  3] 14.0-15.0 sec  4.48 GBytes  38.5 Gbits/sec
[  3] 15.0-16.0 sec  4.67 GBytes  40.2 Gbits/sec
[  3] 16.0-17.0 sec  4.37 GBytes  37.5 Gbits/sec
[  3] 17.0-18.0 sec  4.68 GBytes  40.2 Gbits/sec
[  3] 18.0-19.0 sec  4.99 GBytes  42.9 Gbits/sec
[  3] 19.0-20.0 sec  5.00 GBytes  42.9 Gbits/sec
[  3] 20.0-21.0 sec  4.90 GBytes  42.1 Gbits/sec
[  3] 21.0-22.0 sec  4.72 GBytes  40.5 Gbits/sec
[  3] 22.0-23.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3] 23.0-24.0 sec  4.72 GBytes  40.6 Gbits/sec
[  3] 24.0-25.0 sec  4.42 GBytes  38.0 Gbits/sec
[  3] 25.0-26.0 sec  4.44 GBytes  38.2 Gbits/sec
[  3] 26.0-27.0 sec  4.18 GBytes  35.9 Gbits/sec
[  3] 27.0-28.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3] 28.0-29.0 sec  4.27 GBytes  36.7 Gbits/sec
[  3] 29.0-30.0 sec  4.16 GBytes  35.7 Gbits/sec
[  3] 30.0-31.0 sec  4.14 GBytes  35.6 Gbits/sec
[  3] 31.0-32.0 sec  4.13 GBytes  35.4 Gbits/sec
[  3] 32.0-33.0 sec  4.16 GBytes  35.7 Gbits/sec
[  3] 33.0-34.0 sec  4.33 GBytes  37.2 Gbits/sec
[  3] 34.0-35.0 sec  4.31 GBytes  37.0 Gbits/sec
[  3] 35.0-36.0 sec  4.26 GBytes  36.6 Gbits/sec
[  3] 36.0-37.0 sec  4.36 GBytes  37.5 Gbits/sec
[  3] 37.0-38.0 sec  4.11 GBytes  35.3 Gbits/sec
[  3] 38.0-39.0 sec  4.00 GBytes  34.4 Gbits/sec
[  3] 39.0-40.0 sec  4.53 GBytes  38.9 Gbits/sec
[  3] 40.0-41.0 sec  4.06 GBytes  34.9 Gbits/sec
[  3] 41.0-42.0 sec  4.17 GBytes  35.8 Gbits/sec
[  3] 42.0-43.0 sec  4.14 GBytes  35.6 Gbits/sec
[  3] 43.0-44.0 sec  4.07 GBytes  34.9 Gbits/sec
^C[  3]  0.0-44.5 sec   195 GBytes  37.5 Gbits/sec
[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.156.44  netmask 255.255.248.0  broadcast 10.19.159.255
        inet6 fe80::d306:561f:9f43:ff77  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:1398:9699:325b:25f9:e7bb  prefixlen 64  scopeid
0x0<global>
        ether 56:cc:c1:01:cc:21  txqueuelen 1000  (Ethernet)
        RX packets 12547  bytes 889713 (868.8 KiB)
        RX errors 11  dropped 0  overruns 0  frame 11
        TX packets 373  bytes 45723 (44.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.17  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::bc87:86b8:bc86:be4e  prefixlen 64  scopeid 0x20<link>
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 2862498  bytes 192898865 (183.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3414905  bytes 209192841687 (194.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 2862498  bytes 192898865 (183.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3414905  bytes 212082653599 (197.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 176  bytes 19712 (19.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 176  bytes 19712 (19.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

__________________________________________________________________________________________________________________

The command line I used:

/root/qemu/x86_64-softmmu/qemu-system-x86_64 \
-netdev
tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc17
\
-device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
-netdev
tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4
\
-device
virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=cc1_72,vectors=10,mq=on,primary=cc1_71
\
-device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
-enable-kvm \
-name netkvm \
-m 3000M \
-drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
-smp 4 \
-vga qxl \
-spice port=6110,disable-ticketing \
-device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7
\
-chardev spicevmc,name=vdagent,id=vdagent \
-device
virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=com.redhat.spice.0
\
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
-device virtio-serial \
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
-monitor stdio


On Fri, Oct 19, 2018 at 6:45 AM Michael S. Tsirkin <m...@redhat.com> wrote:

> On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote:
> > On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <m...@redhat.com>
> wrote:
> > >
> > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <m...@redhat.com>
> wrote:
> > > > >
> > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > > > The VF's MAC can be updated by PF/host on the fly at any time.
> One can
> > > > > > start with a random MAC but use group ID to pair device instead.
> And
> > > > > > only update MAC address to the real one when moving MAC filter
> around
> > > > > > after PV says OK to switch datapath.
> > > > > >
> > > > > > Do you see any problem with this design?
> > > > >
> > > > > Isn't this what I proposed:
> > > > >         Maybe we can
> > > > >         start VF with a temporary MAC, then change it to a final
> one when guest
> > > > >         tries to use it. It will work but we run into fact that
> MACs are
> > > > >         currently programmed by mgmnt - in many setups qemu does
> not have the
> > > > >         rights to do it.
> > > > >
> > > > > ?
> > > > >
> > > > > If yes I don't see a problem with the interface design, even though
> > > > > implementation wise it's more work as it will have to include
> management
> > > > > changes.
> > > >
> > > > I thought we discussed this design a while back:
> > > > https://www.spinics.net/lists/netdev/msg512232.html
> > > >
> > > > ... plug in a VF with a random MAC filter programmed in prior, and
> > > > initially use that random MAC within guest. This would require:
> > > > a) not relying on permanent MAC address to do pairing during the
> > > > initial discovery, e.g. use the failover group ID as in this
> > > > discussion
> > > > b) host to toggle the MAC address filter: which includes taking down
> > > > the tap device to return the MAC back to PF, followed by assigning
> > > > that MAC to VF using "ip link ... set vf ..."
> > > > c) notify guest to reload/reset VF driver for the change of hardware
> MAC address
> > > > d) until VF reloads the driver it won't be able to use the datapath,
> > > > so very short period of network outage is (still) expected
> > > >
> > > > though I still don't think this design can elimnate downtime.
> > >
> > >
> > > No, my idea is somewhat different. As you say there is a problem
> > > of delay at point (c).
> > That's true, I never say the downtime can be avoided because of this
> > delay in the guest side. But with this the downtime gets to the bare
> > minimum and in most situations packets won't be lost on reception as
> > long as the PF sets up the filter in timely manner.
>
> It's not really the bare minimum IMHO. E.g. fixing the PF to
> defer filter update will give you less downtime.
>
> > > Further, the need to poke at PF filters
> > > with set vf does not match the current security model where
> > > any security related configuration such as MAC filtering is done
> upfront.
> >
> > The security model belongs to the VM policy not the VF, right? I think
> > same MAC address will always be used on the VM as it starts with
> > virtio. Why it is a security issue that VF starts with an unused MAC
> > before it's able to be used in the guest?
>
> Basically if guest is able to trigger MAC changes,
> it might be able to exploit some bug to escalate that to
> full network access. Completely blocking configuration
> changes after setup feels safer.
>
> Case in point, with QEMU a typical selinux policy will block
> attempts to change MACs, that task will have to be
> delegated to a suitably priveledged tool.
>
>
> >
> > >
> > >
> > > So I have two suggestions:
> > >
> > > 1. Teach pf driver not to program the filter until vf driver actually
> goes up.
> > >
> > >    How do we know it went up? For example, it is highly likely
> > >    that driver will send some kind of command on init.
> > >    E.g. linux seems to always try to set the mac address during init.
> > >    We can have any kind of command received by the PF enable
> > >    the filter, until reset.
> >
> > I'm not sure it's a valid assumption for any guest, say Windows. The
> > VF can start with the MAC address advertised from PF in the first
> > reset, and the MAC filter generally will be activated at that point.
> > Some other PF/VF variants enable the filter after that until the VF is
> > brought up in guest, while some others enable the filter even before
> > the VF gets assigned to guest. Trying to assume the behaviour on
> > specific guest or specific NIC device is a slippery slope.
>
>
> Is all this just theoretical or do you observe any problems in practice?
>
> > The only
> > thing that's reliable is the semantics of ndo_vf_xxx interface for the
> > PF.
>
> ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be
> stable at all. I think you mean the netlink interface that triggers
> that. That should be stable but if what you say above is true isn't
> fully defined.
>
> > You seem to overly assume too much on the specific PF behaviour
> > which is not defined in the interface itself.
>
> So IMHO it's something that we should fix in Linux,
> making all devices behave consistently.
>
> > >
> > >    In absence of an appropriate command, QEMU can detect bus master
> > >    enable and do that.
> > >
> > > 2. Create a variant of trusted VF where it starts out without a valid
> > >    MAC, guest can set a softmac MAC but only can set it to the specific
> > >    value that matches virtio.
> > >    Alternatively - if it's preferred for some reason - allow
> > >    guest to program just two MACs, the original one and the virtio one.
> > >    Any other value is denied.
> >
> > I am getting confused, I don't know why that's even needed. The
> > management tool can set any predefined MAC that is deemed safe for VF
> > to start with. Why it needs to be that complicated? What is the
> > purpose of another model for trusted VF and softmac? It's the PF that
> > changes the MAC not the VF.
>
> This will give us a simple solution without guest driver changes for
> when VF is trusted. In particular it will work e.g. for PFs as well.
>
> > >
> > >
> > >
> > > > However,
> > > > it looks like as of today the MAC matching still haven't addressed
> the
> > > > datapath switching and error handling in a clean way. As said, for
> > > > SR-IOV live migration on iSCSI root disk there will be a lot of
> > > > dancing parts going along the way, reliable network connectity and
> > > > dedicated handshakes are critical to this kind of setup.
> > > >
> > > > -Siwei
> > >
> > > I think MAC matching removes downtime when device is removed but not
> > > when it's re-added, yes. It has the advantage of an already present
> > > linux driver support, but if you are prepared to work on
> > > adding e.g. bridge based matching, that will go away.
> >
> > The removal order and consequence will be the same between MAC
> > matching and group ID based matching. It's just the initial discovery
> > that's slightly different. Why do you think the downtime will be
> > different for the removal scenario? And why do you think it's needed
> > to alter the current PF driver behavior to support bridge based
> > matching? Sorry I'm really confused about your suggestion. Those PF
> > driver model changes are not needed acutally. The fact is that the
> > bridge based matching is supposed to work quite well for any PF driver
> > implementation no matter when the MAC address filters gets added or
> > enabled.
> >
> > Thanks,
> > -Siwei
>
> It seems that it requires a bunch of changes for all VF drivers
> though.
>
> >
> > >
> > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> virtio-dev-unsubscr...@lists.oasis-open.org
> > > > > For additional commands, e-mail:
> virtio-dev-h...@lists.oasis-open.org
> > > > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>
>

-- 
Respectfully,
*Sameeh Jubran*
*Linkedin <https://il.linkedin.com/pub/sameeh-jubran/87/747/a8a>*
*Software Engineer @ Daynix <http://www.daynix.com>.*

Reply via email to