On Mon, 25 Feb 2019 16:58:07 -0800 si-wei liu <si-wei....@oracle.com> wrote:
> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote: > > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote: > >> > >> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote: > >>> > >>> On 2/21/2019 7:33 PM, si-wei liu wrote: > >>>> > >>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote: > >>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote: > >>>>>> Sorry for replying to this ancient thread. There was some remaining > >>>>>> issue that I don't think the initial net_failover patch got addressed > >>>>>> cleanly, see: > >>>>>> > >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268 > >>>>>> > >>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was > >>>>>> not specifically writtten for such kernel automatic enslavement. > >>>>>> Specifically, if it is a bond or team, the slave would typically get > >>>>>> renamed *before* virtual device gets created, that's what udev can > >>>>>> control (without getting netdev opened early by the other part of > >>>>>> kernel) and other userspace components for e.g. initramfs, > >>>>>> init-scripts can coordinate well in between. The in-kernel > >>>>>> auto-enslavement of net_failover breaks this userspace convention, > >>>>>> which don't provides a solution if user care about consistent naming > >>>>>> on the slave netdevs specifically. > >>>>>> > >>>>>> Previously this issue had been specifically called out when IFF_HIDDEN > >>>>>> and the 1-netdev was proposed, but no one gives out a solution to this > >>>>>> problem ever since. Please share your mind how to proceed and solve > >>>>>> this userspace issue if netdev does not welcome a 1-netdev model. > >>>>> Above says: > >>>>> > >>>>> there's no motivation in the systemd/udevd community at > >>>>> this point to refactor the rename logic and make it work well with > >>>>> 3-netdev. > >>>>> > >>>>> What would the fix be? Skip slave devices? > >>>>> > >>>> There's nothing user can get if just skipping slave devices - the > >>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the > >>>> next reboot, while the rest may conform to the naming scheme (ens3 > >>>> and such). There's no way one can fix this in userspace alone - when > >>>> the failover is created the enslaved netdev was opened by the kernel > >>>> earlier than the userspace is made aware of, and there's no > >>>> negotiation protocol for kernel to know when userspace has done > >>>> initial renaming of the interface. I would expect netdev list should > >>>> at least provide the direction in general for how this can be > >>>> solved... > > > > I was just wondering what did you mean when you said > > "refactor the rename logic and make it work well with 3-netdev" - > > was there a proposal udev rejected? > No. I never believed this particular issue can be fixed in userspace > alone. Previously someone had said it could be, but I never see any work > or relevant discussion ever happened in various userspace communities > (for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). > IMHO the root of the issue derives from the kernel, it makes more sense > to start from netdev, work out and decide on a solution: see what can be > done in the kernel in order to fix it, then after that engage userspace > community for the feasibility... > > > Anyway, can we write a time diagram for what happens in which order that > > leads to failure? That would help look for triggers that we can tie > > into, or add new ones. > > > > See attached diagram. > > > > > > > > > > >>> Is there an issue if slave device names are not predictable? The > >>> user/admin scripts are expected > >>> to only work with the master failover device. > >> Where does this expectation come from? > >> > >> Admin users may have ethtool or tc configurations that need to deal with > >> predictable interface name. Third-party app which was built upon specifying > >> certain interface name can't be modified to chase dynamic names. > >> > >> Specifically, we have pre-canned image that uses ethtool to fine tune VF > >> offload settings post boot for specific workload. Those images won't work > >> well if the name is constantly changing just after couple rounds of live > >> migration. > > It should be possible to specify the ethtool configuration on the > > master and have it automatically propagated to the slave. > > > > BTW this is something we should look at IMHO. > I was elaborating a few examples that the expectation and assumption > that user/admin scripts only deal with master failover device is > incorrect. It had never been taken good care of, although I did try to > emphasize it from the very beginning. > > Basically what you said about propagating the ethtool configuration down > to the slave is the key pursuance of 1-netdev model. However, what I am > seeking now is any alternative that can also fix the specific udev > rename problem, before concluding that 1-netdev is the only solution. > Generally a 1-netdev scheme would take time to implement, while I'm > trying to find a way out to fix this particular naming problem under > 3-netdev. > > > > >>> Moreover, you were suggesting hiding the lower slave devices anyway. > >>> There was some discussion > >>> about moving them to a hidden network namespace so that they are not > >>> visible from the default namespace. > >>> I looked into this sometime back, but did not find the right kernel api > >>> to create a network namespace within > >>> kernel. If so, we could use this mechanism to simulate a 1-netdev model. > >> Yes, that's one possible implementation (IMHO the key is to make 1-netdev > >> model as much transparent to a real NIC as possible, while a hidden netns > >> is > >> just the vehicle). However, I recall there was resistance around this > >> discussion that even the concept of hiding itself is a taboo for Linux > >> netdev. I would like to summon potential alternatives before concluding > >> 1-netdev is the only solution too soon. > >> > >> Thanks, > >> -Siwei > > Your scripts would not work at all then, right? > At this point we don't claim images with such usage as SR-IOV live > migrate-able. We would flag it as live migrate-able until this ethtool > config issue is fully addressed and a transparent live migration > solution emerges in upstream eventually. The hyper-v netvsc with 1-dev model uses a timeout to allow udev to do its rename. I proposed a patch to key state change off of the udev rename, but that patch was rejected. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization