Jan Kiszka <jan.kis...@siemens.com> writes:
> On 10.08.21 20:21, Philippe Gerum via Xenomai wrote: >> >> I won't join the Xenomai meeting this week, so this is the latest news >> from Dovetail and Xenomai 4: >> >> Dovetail runs on top of v5.14-rc5 (arm, arm64 and x86_64), the code is >> visible from the v5.14-dovetail-rebase branch at [1]. As usual, I'm >> testing Dovetail with the EVL core (Xenomai 4). The current code is >> available at [2] branch v5.14-evl-rebase. >> >> In addition, several important updates went to the stable Dovetail >> (v5.10.y) tree (i.e. RCU NMI in the pipeline entry). There is no kernel >> interface change which might affect Xenomai3/Cobalt 3.2 though. >> >> With respect to Xenomai 4, progress was made with the network >> (mini-)stack based on the EVL core. The most important aspect is that >> EVL is now able to leverage the common socket interface, for adding new >> network protocols or extending existing ones. This is still WIP, but we >> are getting closer to something usable, and EVL gained a socket >> interface in the process for dealing with real-time protocols. >> >> In a nutshell, the basic idea is to create an out-of-band data path >> traversing the regular network stack which EVL and the applications can >> connect to. This means that a netdev can accept in-band and out-of-band >> traffic, ethtool is still available to configure the ethernet devices >> shared with EVL etc. (as a bonus, there is no need for any proxy in >> order to share a single NIC between the out-of-band and in-band network >> stacks). There is work ahead, and this is fun stuff. >> >> [1] g...@source.denx.de:Xenomai/linux-dovetail.git >> [2] g...@source.denx.de:Xenomai/xenomai4/linux-evl.git >> > > Surely interesting work. Three even more interesting aspects still needs > to be seen, though: > > - How will driver conversions look like in practice (lock and interrupt > conversions, prioritization of data paths over control paths, turning > off throughput favoring features)? > There is no one-fits-it-all approach to this, but the idea remains the same for any EVL-related changes in drivers: - define clear-cut operating modes for the driver, in-band should not overlap with out-of-band during time-critical operations. E.g. no significant reconfiguration while out-of-band packets are in flight, that would have to wait until the oob activity pauses, contention on the converse path is deemed an application bug. However, mixing in-band and out-of-band traffic on the same device should be possible without proxying, with the software always giving precedence to the latter when it comes to feeding the driver. - ensure that all code paths are categorized between in-band only, out-of-band only, and shared between stages. From that point, use some EVL mechanisms if/when applicable like "staxes" in order to enforce basic sanity between the first two. Dovetail also has "hybrid locks", which can be traversed from any stage, still abiding by the semantics of the current stage (in that sense, this is distinct from "hard" locks which enforce the semantics of the out-of-band stage). Of course, that means that the length of the covered sections should be compatible with real-time requirements. These details are only part of the solution obviously, there will be more issues to deal with. However, there is at least one hurdle less: the mini-stack does not define its own (rt)skb type, but rather happily conveys all the traffic via the common sk_buff. This tends to limit the amount of code which needs to be adapted in a NIC driver. I'm thinking about enabling some form of out-of-band support in the NAPI, but this idea is still brewing, nothing concrete yet. > - How to provide zero copy (not available with RTnet either, yes, but > needed for lowest-latency traffic in the future)? > > - How to make buffer allocation similarly deterministic as with rtskbs > (e.g. an evl_net_dev_alloc_skb that needs no timeout but uses a > per-socket pool again)? A "generic" per-socket pool would assume too much about the identity of the DMA mapping for any given socket buffer among multiple devices (which is the limitation rtskb_map() lives with). Since the regular way is to have a per-device mapping strategy, the pre-mapped buffers we need should be obtained from the device driver, not from the generic net core. In order to achieve some form of starvation prevention, I would rather go for limiting the amount of buffer memory consumed by a socket at any point in time, similarly to the sk_{r|w}mem_alloc counters of the regular net core. Added to that, a socket would be allowed to reserve a number of socket buffers from a given device pool, which corresponds to the arbitrary amount of memory specified for SO_SNDBUF. The mini-stack would then contribute the corresponding number of freshly allocated buffers to the proper per-device pool. Conversely, such ownership would influence the way out-of-band socket buffers are released after use. If some application needs such reserve guarantee from multiple devices, then it would have to use multiple sockets, which seems an acceptable requirement. IOW, if all sockets contribute the amount of guaranteed socket buffers to the buffer pool of the device they are bound to, and these socket are not allowed to over-consume their guaranteed amount, then we should be ok. A way to achieve zero-copy would involve extending this per-socket, per-device reserve to the RX side, and make the resulting rings exportable to userland with proper synchronization. In this case, the reserved socket buffers forming the TX/RX rings could refer to different segments of a single piece of kernel memory which the application would map. Grantedd, this looks all nice and simple in theory, the devil is obviously in the details in practice, but that should be doable. This said, I have a modest roadmap for the mini-stack for the time being, which is supporting AF_PACKET and AF_INET/IPPROTO_UDP in out-of-band mode end-to-end, from the application to the wire through the NIC driver, using common kernel/user memory transfers. No bells and whistles, just the basic reliable stuff my application use case requires. -- Philippe.