This is another bit of the puzzle for supporting multiple rx rings and receive side scaling (RSS) on nics. It borrows heavily from DragonflyBSD, but I've made some tweaks on the way.
For background on the dfly side, I recommend having a look at https://leaf.dragonflybsd.org/~sephe/AsiaBSDCon%20-%20Dfly.pdf. >From my point of view, the interesting thing is that they came up with a way to use Toeplitz hashing so the kernel AND network interfaces hash packets so packets in both directions onto the same bucket. The other interesting thing is that the optimised the hash calculation by building a cache of all the intermediate results possible for each input byte. Their hash calculation is simply xoring these intermediate reults together. I've made some tweaks compared to dfly for how the caching is calculated and used, so it's not an exactly 1:1 port of the dfly code. If anyone is interested in the tweaks, let me know. So this diff adds an API for the kernel to use for calculating a hash for ip addresses and ports, and adds a function for network drivers to call that gives them a key to use with RSS. If all drivers use the same key, then the same flows should be steered to the same place when they enter the network stack regardless of which hardware they came in on. I've tested it with vmx(4) and some quick and dirty hacks to the network stack (and with some magical observability), and can see things like tcpbench push packets onto the same numbered ifq/txring that the "nic" picks for the rxring and therefore ifiq into the stack. We're going to try it on some more drivers soon. The way this is set up now, if a nic driver wants to do RSS, you add stoeplitz as a dependency in the kernel config file, which causes this code to be included in the build. There's some discussion to be had about the best way to integrate this on the IP stack side, but that is about where this API is called from, not the internals of it per se. Thoughts? ok? Index: share/man/man9/stoeplitz_to_key.9 =================================================================== RCS file: share/man/man9/stoeplitz_to_key.9 diff -N share/man/man9/stoeplitz_to_key.9 --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ share/man/man9/stoeplitz_to_key.9 29 May 2020 04:01:26 -0000 @@ -0,0 +1,126 @@ +.\" $OpenBSD$ +.\" +.\" Copyright (c) 2020 David Gwynne <[email protected]> +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate: May 29 2020 $ +.Dt STOEPLITZ_TO_KEY 9 +.Os +.Sh NAME +.Nm stoeplitz_to_key , +.Nm stoeplitz_hash_ip4 , +.Nm stoeplitz_hash_ip4port , +.Nm stoeplitz_hash_ip6 , +.Nm stoeplitz_hash_ip4port +.Nd Symmetric Toeplitz Hash API +.Sh SYNOPSIS +.In net/toeplitz.h +.Ft void +.Fn stoeplitz_to_key "uint8_t *key" "size_t keylen" +.Ft uint16_t +.Fo stoeplitz_hash_ip4 +.Fa "uint32_t srcaddr" +.Fa "uint32_t dstaddr" +.Fc +.Ft uint16_t +.Fo stoeplitz_hash_ip4port +.Fa "uint32_t srcaddr" +.Fa "uint32_t dstaddr" +.Fa "uint16_t srcport" +.Fa "uint16_t dstport" +.Fc +.Ft uint16_t +.Fo stoeplitz_hash_ip6 +.Fa "const struct in6_addr *srcaddr" +.Fa "const struct in6_addr *dstaddr" +.Fc +.Ft uint16_t +.Fo stoeplitz_hash_ip6port +.Fa "const struct in6_addr *srcaddr" +.Fa "const struct in6_addr *dstaddr" +.Fa "uint16_t srcport" +.Fa "uint16_t dstport" +.Fc +.Sh DESCRIPTION +The Toeplitz hash algorithm is commonly used by network interface +controllers to to generate a short hash based on the value of fields +in network packet headers. +.\" mention RSS? +The resulting hash value can be used as a flow identifier, which +in turn can be used to consistently select a context for processing +packets using those fields. +Traditionally, the Toeplitz hash produces different results depending +on the order of inputs, ie, adding port 80 then 1234 as inputs would +produce a different result to hashing port 1234 then 80. +.Pp +The symmetric Toeplitz API uses a key selected to generate the same +hash result regardless of the order the inputs were added. +The API also supports producing Toeplitz hash keys for use by +network interface controllers that provide the same symmetric +property. +.Pp +The +.Fn stoeplitz_to_key +function generates a Toeplitz key for use by a network interface +controller based on the systems symmetric Toeplitz key. +A Toeplitz key of +.Fa keylen +bytes will be written to the buffer referenced by the +.Fa key +argument. +.Fa keylen +must be a multiple of 2 bytes. +.Pp +.Fn stoeplitz_hash_ip4 +calculates a hash value for a pair of IPv4 addresses. +.Pp +.Fn stoeplitz_hash_ip4port +calculates a hash value for a pair of IPv4 addresses and ports as +used by protocols like TCP or UDP. +.Pp +.Fn stoeplitz_hash_ip6 +calculates a hash value for a pair of IPv6 addresses. +.Pp +.Fn stoeplitz_hash_ip6port +calculates a hash value for a pair of IPv6 addresses and ports as +used by protocols like TCP or UDP. +.Sh CONTEXT +.Fn stoeplitz_to_key , +.Fn stoeplitz_hash_ip4 , +.Fn stoeplitz_hash_ip4port , +.Fn stoeplitz_hash_ip6 , +and +.Fn stoeplitz_hash_ip6port +can be called during autoconf, from process context, or from an +interrupt context. +.Sh RETURN VALUES +.Fn stoeplitz_hash_ip4 , +.Fn stoeplitz_hash_ip4port , +.Fn stoeplitz_hash_ip6 , +and +.Fn stoeplitz_hash_ip6port +return a 16 bit hash value in host byte order. +.Sh SEE ALSO +.Xr mbuf 9 , +.Xr spl 9 +.Sh HISTORY +The symmetric Toeplitz API is based on the ideas and implementation in +.Dx +by +.An Yanmin Qiao Aq Mt [email protected] +and +.An Simon Schubert Aq Mt [email protected] . +.Pp +The API appeared in +.Ox 6.8 . Index: share/man/man9/Makefile =================================================================== RCS file: /cvs/src/share/man/man9/Makefile,v retrieving revision 1.299 diff -u -p -r1.299 Makefile --- share/man/man9/Makefile 6 Dec 2019 10:42:33 -0000 1.299 +++ share/man/man9/Makefile 29 May 2020 04:01:26 -0000 @@ -36,7 +36,8 @@ MAN= aml_evalnode.9 atomic_add_int.9 ato sensor_attach.9 sigio_init.9 \ SMR_LIST_INIT.9 SMR_PTR_GET.9 smr_call.9 \ spl.9 srp_enter.9 srpl_rc_init.9 startuphook_establish.9 \ - socreate.9 sosplice.9 strcmp.9 style.9 syscall.9 sysctl_int.9 \ + socreate.9 sosplice.9 stoeplitz_to_key.9 strcmp.9 style.9 \ + syscall.9 sysctl_int.9 \ task_add.9 tc_init.9 tfind.9 thread_fork.9 \ time_second.9 timeout.9 tsleep.9 tvtohz.9 \ uiomove.9 \ Index: sys/net/toeplitz.c =================================================================== RCS file: sys/net/toeplitz.c diff -N sys/net/toeplitz.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ sys/net/toeplitz.c 29 May 2020 04:01:26 -0000 @@ -0,0 +1,231 @@ +/* $OpenBSD$ */ + +/* + * Copyright (c) 2009 The DragonFly Project. All rights reserved. + * + * This code is derived from software contributed to The DragonFly Project + * by Sepherosa Ziehau <[email protected]> + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * 3. Neither the name of The DragonFly Project nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific, prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE + * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, + * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED + * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* + * Copyright (c) 2019 David Gwynne <[email protected]> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/kernel.h> +#include <sys/sysctl.h> + +#include <netinet/in.h> + +#include <net/toeplitz.h> + +/* + * symmetric toeplitz + */ + +static stoeplitz_key stoeplitz_keyseed = STOEPLITZ_KEYSEED; +static struct stoeplitz_cache stoeplitz_syskey_cache; +const struct stoeplitz_cache *const + stoeplitz_cache = &stoeplitz_syskey_cache; + +void +stoeplitz_init(void) +{ + stoeplitz_cache_init(&stoeplitz_syskey_cache, stoeplitz_keyseed); +} + +#define NBSK (NBBY * sizeof(stoeplitz_key)) + +void +stoeplitz_cache_init(struct stoeplitz_cache *scache, stoeplitz_key skey) +{ + uint32_t key[NBBY]; + unsigned int j, b, shift, val; + + bzero(key, sizeof(key)); + + /* + * Calculate 32bit keys for one byte; one key for each bit. + */ + for (b = 0; b < NBBY; ++b) { + for (j = 0; j < 32; ++j) { + unsigned int bit; + + bit = b + j; + + shift = NBSK - (bit % NBSK) - 1; + if (skey & (1 << shift)) + key[b] |= 1 << (31 - j); + } + } + + /* + * Cache the results of all possible bit combination of + * one byte. + */ + for (val = 0; val < 256; ++val) { + uint32_t res = 0; + + for (b = 0; b < NBBY; ++b) { + shift = NBBY - b - 1; + if (val & (1 << shift)) + res ^= key[b]; + } + scache->bytes[val] = res; + } +} + +uint16_t +stoeplitz_hash_ip4(const struct stoeplitz_cache *scache, + in_addr_t faddr, in_addr_t laddr) +{ + uint16_t lo, hi; + + lo = stoeplitz_cache_entry(scache, faddr >> 0); + lo ^= stoeplitz_cache_entry(scache, faddr >> 16); + lo ^= stoeplitz_cache_entry(scache, laddr >> 0); + lo ^= stoeplitz_cache_entry(scache, laddr >> 16); + + hi = stoeplitz_cache_entry(scache, faddr >> 8); + hi ^= stoeplitz_cache_entry(scache, faddr >> 24); + hi ^= stoeplitz_cache_entry(scache, laddr >> 8); + hi ^= stoeplitz_cache_entry(scache, laddr >> 24); + + return (swap16(lo) ^ hi); +} + +uint16_t +stoeplitz_hash_ip4port(const struct stoeplitz_cache *scache, + in_addr_t faddr, in_addr_t laddr, in_port_t fport, in_port_t lport) +{ + uint16_t hi, lo; + + lo = stoeplitz_cache_entry(scache, faddr >> 0); + lo ^= stoeplitz_cache_entry(scache, faddr >> 16); + lo ^= stoeplitz_cache_entry(scache, laddr >> 0); + lo ^= stoeplitz_cache_entry(scache, laddr >> 16); + lo ^= stoeplitz_cache_entry(scache, fport >> 0); + lo ^= stoeplitz_cache_entry(scache, lport >> 0); + + hi = stoeplitz_cache_entry(scache, faddr >> 8); + hi ^= stoeplitz_cache_entry(scache, faddr >> 24); + hi ^= stoeplitz_cache_entry(scache, laddr >> 8); + hi ^= stoeplitz_cache_entry(scache, laddr >> 24); + hi ^= stoeplitz_cache_entry(scache, fport >> 8); + hi ^= stoeplitz_cache_entry(scache, lport >> 8); + + return (swap16(lo) ^ hi); +} + +#ifdef INET6 +uint16_t +stoeplitz_hash_ip6(const struct stoeplitz_cache *scache, + const struct in6_addr *faddr6, const struct in6_addr *laddr6) +{ + uint16_t hi = 0, lo = 0; + size_t i; + + for (i = 0; i < nitems(faddr6->s6_addr32); i++) { + uint32_t faddr = faddr6->s6_addr32[i]; + uint32_t laddr = laddr6->s6_addr32[i]; + + lo ^= stoeplitz_cache_entry(scache, faddr >> 0); + lo ^= stoeplitz_cache_entry(scache, faddr >> 16); + lo ^= stoeplitz_cache_entry(scache, laddr >> 0); + lo ^= stoeplitz_cache_entry(scache, laddr >> 16); + + hi ^= stoeplitz_cache_entry(scache, faddr >> 8); + hi ^= stoeplitz_cache_entry(scache, faddr >> 24); + hi ^= stoeplitz_cache_entry(scache, laddr >> 8); + hi ^= stoeplitz_cache_entry(scache, laddr >> 24); + } + + return (swap16(lo) ^ hi); +} + +uint16_t +stoeplitz_hash_ip6port(const struct stoeplitz_cache *scache, + const struct in6_addr *faddr6, const struct in6_addr * laddr6, + in_port_t fport, in_port_t lport) +{ + uint16_t hi = 0, lo = 0; + size_t i; + + for (i = 0; i < nitems(faddr6->s6_addr32); i++) { + uint32_t faddr = faddr6->s6_addr32[i]; + uint32_t laddr = laddr6->s6_addr32[i]; + + lo ^= stoeplitz_cache_entry(scache, faddr >> 0); + lo ^= stoeplitz_cache_entry(scache, faddr >> 16); + lo ^= stoeplitz_cache_entry(scache, laddr >> 0); + lo ^= stoeplitz_cache_entry(scache, laddr >> 16); + + hi ^= stoeplitz_cache_entry(scache, faddr >> 8); + hi ^= stoeplitz_cache_entry(scache, faddr >> 24); + hi ^= stoeplitz_cache_entry(scache, laddr >> 8); + hi ^= stoeplitz_cache_entry(scache, laddr >> 24); + } + + lo ^= stoeplitz_cache_entry(scache, fport >> 0); + lo ^= stoeplitz_cache_entry(scache, lport >> 0); + + hi ^= stoeplitz_cache_entry(scache, fport >> 8); + hi ^= stoeplitz_cache_entry(scache, lport >> 8); + + return (swap16(lo) ^ hi); +} +#endif /* INET6 */ + +void +stoeplitz_to_key(uint8_t *k, size_t klen) +{ + uint16_t skey = htons(stoeplitz_keyseed); + size_t i; + + KASSERT((klen % 2) == 0); + + for (i = 0; i < klen; i += sizeof(skey)) { + k[i + 0] = skey >> 8; + k[i + 1] = skey; + } +} Index: sys/net/toeplitz.h =================================================================== RCS file: sys/net/toeplitz.h diff -N sys/net/toeplitz.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ sys/net/toeplitz.h 29 May 2020 04:01:26 -0000 @@ -0,0 +1,113 @@ +/* $OpenBSD$ */ + +/* + * Copyright (c) 2019 David Gwynne <[email protected]> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#ifndef _SYS_NET_TOEPLITZ_H_ +#define _SYS_NET_TOEPLITZ_H_ + +#include <sys/endian.h> + +/* + * symmetric toeplitz + */ + +typedef uint16_t stoeplitz_key; + +struct stoeplitz_cache { + uint16_t bytes[256]; +}; + +static __unused inline uint16_t +stoeplitz_cache_entry(const struct stoeplitz_cache *scache, uint8_t byte) +{ + return (scache->bytes[byte]); +} + +void stoeplitz_cache_init(struct stoeplitz_cache *, stoeplitz_key); + +uint16_t stoeplitz_hash_ip4(const struct stoeplitz_cache *, + uint32_t, uint32_t); +uint16_t stoeplitz_hash_ip4port(const struct stoeplitz_cache *, + uint32_t, uint32_t, uint16_t, uint16_t); + +#ifdef INET6 +struct in6_addr; +uint16_t stoeplitz_hash_ip6(const struct stoeplitz_cache *, + const struct in6_addr *, const struct in6_addr *); +uint16_t stoeplitz_hash_ip6port(const struct stoeplitz_cache *, + const struct in6_addr *, const struct in6_addr *, + uint16_t, uint16_t); +#endif + +/* hash a uint16_t in network byte order */ +static __unused inline uint16_t +stoeplitz_hash_n16(const struct stoeplitz_cache *scache, uint16_t n16) +{ + uint16_t hi, lo; + + hi = stoeplitz_cache_entry(scache, n16 >> 8); + lo = stoeplitz_cache_entry(scache, n16); + + return (hi ^ swap16(lo)); +} + +/* hash a uint16_t in host byte order */ +static __unused inline uint16_t +stoeplitz_hash_h16(const struct stoeplitz_cache *scache, uint16_t h16) +{ + uint16_t lo, hi; + + lo = stoeplitz_cache_entry(scache, h16); + hi = stoeplitz_cache_entry(scache, h16 >> 8); + +#if _BYTE_ORDER == _BIG_ENDIAN + return (hi ^ swap16(lo)); +#else + return (swap16(hi) ^ lo); +#endif +} + +/* + * system provided symmetric toeplitz + */ + +#define STOEPLITZ_KEYSEED 0x6d5a + +void stoeplitz_init(void); + +void stoeplitz_to_key(uint8_t *, size_t) + __bounded((__buffer__, 1, 2)); + +extern const struct stoeplitz_cache *const stoeplitz_cache; + +#define stoeplitz_n16(_n16) \ + stoeplitz_cache_n16(stoeplitz_cache, (_n16)) +#define stoeplitz_h16(_h16) \ + stoeplitz_cache_h16(stoeplitz_cache, (_h16)) +#define stoeplitz_port(_p) stoeplitz_n16((_p)) +#define stoeplitz_ip4(_sa4, _da4) \ + stoeplitz_hash_ip4(stoeplitz_cache, (_sa4), (_da4)) +#define stoeplitz_ip4port(_sa4, _da4, _sp, _dp) \ + stoeplitz_hash_ip4port(stoeplitz_cache, (_sa4), (_da4), (_sp), (_dp)) +#ifdef INET6 +#define stoeplitz_ip6(_sa6, _da6) \ + stoeplitz_hash_ip6(stoeplitz_cache, (_sa6), (_da6)) +#define stoeplitz_ip6port(_sa6, _da6, _sp, _dp) \ + stoeplitz_hash_ip6port(stoeplitz_cache, (_sa6), (_da6), (_sp), (_dp)) +#endif + +#endif /* _SYS_NET_TOEPLITZ_H_ */ Index: sys/conf/files =================================================================== RCS file: /cvs/src/sys/conf/files,v retrieving revision 1.686 diff -u -p -r1.686 files --- sys/conf/files 15 Apr 2020 09:26:49 -0000 1.686 +++ sys/conf/files 29 May 2020 04:01:26 -0000 @@ -62,6 +62,7 @@ define ether define mpls define sppp define wlan +define stoeplitz # "Chipset" attributes. These are the machine-independent portions # of device drivers. @@ -826,6 +830,7 @@ file net/if_pair.c pair file net/if_pppx.c pppx needs-count file net/if_vxlan.c vxlan needs-count file net/bfd.c bfd +file net/toeplitz.c stoeplitz needs-flag file net80211/ieee80211.c wlan file net80211/ieee80211_amrr.c wlan file net80211/ieee80211_crypto.c wlan Index: sys/kern/init_main.c =================================================================== RCS file: /cvs/src/sys/kern/init_main.c,v retrieving revision 1.298 diff -u -p -r1.298 init_main.c --- sys/kern/init_main.c 25 May 2020 15:24:30 -0000 1.298 +++ sys/kern/init_main.c 29 May 2020 04:01:26 -0000 @@ -104,6 +105,11 @@ extern void kubsan_init(void); extern void nfs_init(void); #endif +#include "stoeplitz.h" +#if NSTOEPLITZ > 0 +extern void stoeplitz_init(void); +#endif + #include "mpath.h" #include "vscsi.h" #include "softraid.h" @@ -241,6 +247,10 @@ main(void *framep) * allocate mbufs or mbuf clusters during autoconfiguration. */ mbinit(); + +#if NSTOEPLITZ > 0 + stoeplitz_init(); +#endif /* Initialize sockets. */ soinit();
