Hey David, On Wed, May 27, 2020 at 2:26 AM David Gwynne <da...@gwynne.id.au> wrote: > > On Tue, May 26, 2020 at 05:42:13PM -0600, Jason A. Donenfeld wrote: > > On Tue, May 26, 2020 at 4:52 PM Jason A. Donenfeld <ja...@zx2c4.com> wrote: > > > With regards to your crash, though, that's a bit more puzzling, and > > > I'd be interested to learn more details. Because these structs are > > > already naturally aligned, the __packed attribute, even with the odd > > > nesting Matt had prior, should have produced all entirely aligned > > > accesses. That makes me think your kaboom was coming from someplace > > > else. One possibility is that you were running the git tree on the two > > > days that I was playing with uint128_t, only to find out that some of > > > openbsd's patches to clang miscalculate stack sizes when they're in > > > use, so that work was shelved for another day and the commits removed; > > > perhaps you were just unlucky? Or you hit some other bug that's > > > lurking. Either way, output from ddb's `bt` would at least be useful. > > When you say clang patches miscalculate the stack size with uint128_t, > do you know which one(s) specifically? Was it -msave-args?
Yep, exactly. It looked to me like that plugin was considering each argument of a function to be register-sized, which obviously isn't the case for uint128_t. I had planned to come back to that after the wireguard stuff is finished. > Anyway, tl;dr, my guess is you're reading a 64bit value out of a packet, > but I'm pretty sure the stack probably only provides 32bit alignment. Yep, that seems to be exactly the case. Earlier today I looked at every struct passed to mtod for the entire kernel, and found two other drivers that do this with 64bit types, but maybe they're in contexts where that's okay somehow. Now that I've got my sparc64 rig cooking away at these kernels, I can confirm that this is indeed the case, and it's trivially fixed by just memcpy'ing to/from a properly variable on the stack. In my experience, this pattern, uint64_t i; memcpy(&i, somecharbuffer, sizeof(i)); return i; optimizes to just the simple boring cast on architectures with fast unaligned access (e.g. amd64), while doing the smart thing for architectures that trap or have a performance penalty. IOW, the compiler generates optimal code over that pattern, so that's what I've done to work around the issue here. > - The network stack accesses bits of packets directly. I'm pretty > sure so far this is limited to 32bit word accesses for things like > addresses in IPv4 headers, GRE protocol fields, etc. I'm not sure > there are (currently) any 64bit accesses. My review earlier today indeed showed all max 32bit, except for two drivers: - sys/net/switchofp.c and sys/net/ofp.h - sys/dev/usb/if_athn_usb.c and sys/dev/usb/if_athn_usb.h Either those are incorrect, are limited to archs that don't trap, or are used in contexts where other factors make them safe. But beyond those two, I couldn't find any others. > Sorry for the rambling. Hopefully you know a bit more about mbuf > and network stack alignment now though. Not a problem at all, super appreciated. Looking forward to your other comments on the driver too. Jason