I've been working on this a bit from a completely independent perspective: bootstrapping embedded systems which have a persistent keypair, but no persistent storage for stuff like `AllowedIPs` assignments. In my usecase, the by-convention assignment of an IPv6 link-local address to each WireGuard peer allows a gossip-style protocol to update a newly-joined node with a (signed) set of configuration parameters, including the `AllowedIPs` entries that enable more comprehensive communication.
The fact that the assignment of cryptographically-bound IPv6 LLAs has independently occurred to multiple parties now is not lost on me -- it's usually a sign of good design! I also agree that this type of thing makes a lot more sense in the context of `wg-quick` than the kernel module or `wg` tools themselves. However, care should be taken to make sure that all potential implementations can adopt it without extra overhead. For this reason, I'm biased towards simplicity in the specification, not necessarily simplicity of implementation as part of `wg-quick`. I would caution that the decision of how to generate and assign addresses from public keys should be treated as a layer-3 problem. Each IPv6 network device is *required* to have a link-local address by the RFC -- even if you can get away without one in practice this makes it clear that the proper conceptual home of LLA assignment is in the realm of bits and bytes, rather than strings and pipes -- even if its appropriate place in the architecture of the *reference implementation* is in a optional shell script. One more point of clarification: ULAs are in the `fc00::/7` space, while LLAs are in `fe80::/10`. LLAs are what we want, because they are explicitly interface-scoped -- and that means that they can be counted on to be always be bound to the peer, no matter what the specific network configuration of the node might be. Sending a packet to `fe80::dead:beef%wg0` will always refer to a specific peer on the `wg0` interface, and provides a guarantee that the contents of that packet will be transmitted securely; whereas sending to `fc00::dead:beef` *might* be on the `wg0` interface, but to be sure you'd have to know that you didn't have a route to that address via any other interface. This might be true on some -- or most -- nodes, but it's not something that can be assumed. This makes cryptographically-derived ULAs much less useful than cryptographically-*bound* LLAs. # OK, so how do we do it? The general idea of using a hash to generate an IPv6 LLA is fairly straightforward (and obvious, given that several people have come up with it independently), but there are still some points that require standardization. I think I have an exhaustive list of the points of divergence that must be addressed; I will discuss each of them and my perspective. ## What netmask should be used? **fe80::/10.** The IPv6 RFCs separate the address into a subnet and interface identifier, which would seem to indicate that something like `fe80::/64` should be used instead; however, by their very nature link-local addresses are not part of a subnet. In addition, it is desirable that each address be bound as strongly as possible to the key it is derived from -- 118-bit security is a lot closer to 128-bit than 64-bit security. ## Should the subnet identifier be concatenated with the results of the hash, or should leading bits of the hash be dropped? **(SUBNET & MASK) | (HASH & ~MASK)** Binary math is good, cheap, and obvious, whereas concatenation is only straightforward if the netmask is a whole number of bytes. Otherwise you have to bitshift everything and it just gets messy. Besides, it's a net*mask* -- seems like you should use it to *mask* things. ## Should the hash be taken over the key itself, or the Base64 encoding of the key? **The key itself.** While the tools are fairly consistent in the use of the Base64 encoding in user-facing scenarios, it's important to consider that there's nothing fundamental about the WireGuard protocol itself that requires the use of Base64 anywhere. I argue that it would be inappropriate to introduce a dependency on it at such a low level -- especially since you can just do `base64 -d` inside `wg-quick`. ## What algorithm should the hash be done with? **Blake2s with 32 bytes of output**. This is simply the `HASH()` function in the WireGuard protocol specification, and I think that using the same hash function as the Noise construction makes a lot of sense. Even though output length is a tunable parameter of the Blake2s function and an LLA will never use more than 16 bytes, I feel that being consistent and obvious is important. (Also, note that Blake2 tunes output length by truncation internally; the only difference between taking a 16- or 32-byte long digest is flipping a couple of bits during the setup phase. The performance characteristics are exactly the same.) That said, most of the attempts at implementing a IPv6 LLA assignment scheme I've seen simply depend on `sha256sum` and call it a day, because there's not a widespread CLI tool that does Blake2s for you. There *are* a couple of different tools named `b2sum` -- the one made by the Blake2 authors is fine, but the identically-named GNU coreutils utility, which most people will get if they install their distro's `b2sum` package, only does Blake2b (and takes a different set of flags to boot). Still, like I mentioned above, we should be looking at this from a protocol point of view, and requiring a whole extra crypto primitive just for calculating an LLA seems wasteful. Implementing WireGuard already requires that the Blake2s hash be available, and that it's not easily accessible by the wg-quick tool is simply an unfortunate quirk of the reference implementation. Think about a constrained environment like a microcontroller -- SHA256 isn't a simple algorithm, and it would probably cause a 50% increase in code size. Luckily, Blake2s is a simple and elegant algorithm, and in an effort to get some working code out there I've [implemented][1] it in ~100 lines of Bash script. (It's gotta be Bash because it needs array support, but that's what `wg-quick` uses anyway.) It's slow compared to a typical implementation, but it's not like we're mining cryptocurrency here, and because WireGuard public keys are of a known, fixed length the input will never be longer than a single block. (Single-block hashes benchmark at around 50ms on my system, just for reference.) I hope this helps accelerate the project, but I can understand that a shell implementation might seem too janky for long-term use: a potential solution would be to integrate the LLA calculation into the wg tool, in a similar fashion to how the Curve25519 public key calculation is handed by `wg pubkey`. I'm imagining a `wg lla` command which takes in a Base64-encoded public key and spits out a string of the form `fe8b:5ea9:9e65:3bc2:b593:db41:30d1:0a4e` (which happens to be the LLA associated with an all-zero public key under my proposed scheme). [1]: https://gist.github.com/reidrankin/3a39210ce437680f5cf1ac549fd1f1ff --Reid On Wed, Jun 24, 2020 at 1:11 PM Chriztoffer Hansen <c...@ntrv.dk> wrote: > > On Wed, 24 Jun 2020 at 17:37, Florian Klink <flo...@flokli.de> wrote: > > Deriving an IPv6 link-local address from the pubkey and adding it to the > > interface should be a no-brainer and sane default, and already fix Babel > > Routing (and most other issues) for "point-to-point tunnels" > > (only one peer, both sides set AllowedIPs=::/0). > > An idea to implement as an option for e.g. wg-quick, rather than the > base code-base itself? > > -- > > Chriztoffer