Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
On Mon, 7 Jun 2021 16:46:17 +0500 Roman Mamedov wrote: > On Mon, 7 Jun 2021 13:27:10 +0200 > "Jason A. Donenfeld" wrote: > > > Can you walk me through your use case a bit more, so I can wrap my mind > > around the requirements? > > > > ingress --plain--> wireguard --wireguard[plain]--> vxlan > > --vxlan[wireguard[plain]]--> egress > > Not sure I understand your scheme correctly. In any case, the path of a > packet would be... > > On peer 1: > > * plain Ethernet -> wrapped into VXLAN -> encrypted into WireGuard > > On peer 2: > > * decrypted from WireGuard -> unwrapped from VXLAN -> plain Ethernet > > > So my question is, why can't you set wireguard's MTU to 80 bytes less > > than vxlan's MTU? What's preventing that or making it infeasible? > > To transparently bridge two Ethernet LANs, a VXLAN interface needs to join an > L2 bridge. All interfaces that are members of a bridge must have the same MTU. > > As such, br0 members on both sides: > eth0 (MTU 1500) > vx0 (MTU 1500) > > VXLAN transports full L2 frames encapsulating them into UDP. To fit the > full 1500-byte packet and accounting for VXLAN and related IP overheads, > the resulting packet size is 1574 bytes. > > So this same host that just generated the 1574-byte encapsulated VXLAN packet > with something it received via its eth0 port, now needs to send it further to > its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or > more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it > could, that would mean a much worse overhead ratio than currently. > > [1] https://datatracker.ietf.org/doc/html/rfc7348#section-4.3 In case you are not convinced by this case, would you consider at least allowing fragmentation when WG's in-tunnel MTU is set to >=1500? Because this is the user effectively saying "yes I know this is not gonna fit in one packet, I want to rely on WG packets being fragmented", but without the need for extra knobs. -- With respect, Roman
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
On 2021-06-06 21:03, Roman Mamedov wrote: > On Sun, 6 Jun 2021 11:13:36 +0200 > "Jason A. Donenfeld" wrote: > >> Specifically the change would be to not allow IP fragmentation of the >> encrypted UDP packets. This way, in the case of a loop, eventually the >> packet size exceeds MTU, and it gets dropped: dumb and effective. >> Depending on how this discussion goes, a compromise would be to not >> allow fragmentation, but only for forwarded and kernel-generated >> packets, not not for locally generated userspace packets. That's more >> complex and I don't like it as much as just disallowing IP >> fragmentation all together. >> >> Pros: >> - It solves the routing loop problem very simply. > > Doesn't TTL already solve this? > >> - Maybe people are running >> wireguard-over-gre-over-vxlan-over-l2tp-over-pppoe-over-god-knows-what-else, >> and this reduces the MTU to below 1280, yet they still want to put >> IPv6 through wireguard, and are willing to accept the performance >> implications. > > Not only that. Sometimes transparent bridging of 1500 MTU LANs is required. > > VXLAN does not allow tunnel endpoints to produce fragmented VXLAN packets. > > With WG we can fragment them one level lower, *and* gain a higher efficiency > compared to hypothetical VXLAN's fragmentation, due to less header overhead on > 2nd and further packets in a chain. > > It would be unfortunate if this will become no longer possible. > > It appears to me that people who might need to transparently join multiple > Ethernet LANs due to legacy network topologies they have to work with, weird > requirements, various legacy software etc, would outnumber those who even run > WG over WG at all, let alone getting themselves into a routing loop that way. > All of the above, really - not allowing "full" sized frames over WG breaks a huge number of use cases - even simple ones, because regardless of how much it's wished to be true, in reality pmtu isn't very useful and doesn't work for many cases even in an environment where it isn't completely broken by firewalls/misconfiguration. A [probably common] simple example is where there are 1500 byte speakers on either side of a WG link (e.g. the internet, or some satellite site) - having a <1500 byte link in the middle will break many applications, in particular especially UDP based protocols. Unfortunately the better solution is likely to make it configurable, or allow fragmentation for forwarded traffic (since the host already knows the mtu, this solves the problem without requiring any user config) - although understandably you don't want to add more complexity thanks
Re: Certain private keys being mangled by wg on FreeBSD
On 6/7/21, Christian McDonald wrote: > One byproduct of this exercise was some code that I whipped > up that can at least detect a clamped vs unclamped key. This might > prove useful for informing a user of what is going on and thus > eliminating this class of erroneous bug report entirely. I'd recommend *not* introducing users to weird ideas like clamping or key transformation. While learning new concepts and bit masking in PHP is undoubtedly fun, those concerns shouldn't be user-facing. There's nothing wrong or dangerous about unclamped scalars passed to a proper 25519 implementation, because the implementation will clamp on input. Throwing an "X-vs-unX" distinction to users will just result in pointless fear mongering nonsense. Instead just communicate the identity of an interface by its public key, rather than its private key. If you're not willing to hide or mask private keys (which you really should), then at least deemphasize them?
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
On Sun, Jun 06, 2021 at 01:14:16PM +0200, Peter Linder wrote: > This would break things for me. We're doing a lot of L2 over L3 site to > site stuff and we are using wireguard as the outer layer. Inner layer is > vxlan or l2tpv3. > > In particular, people connect lots of stuff with no regard for MTU. For > some things it's also very hard to change so we just assume people > don't. Since the L3 network typically has the same MTU as the inner L2 > network, we need fragmentation. There is no practical way to be able to > tell hosts on the L2 network about the limited mtu, for all we know they > don't even run IP I've not looked in to vxlan much, but for L2TP you always have recourse to RFC 4623, where the MRU & MRRU can be exchanged. DF
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
This is indeed the case for me, spot on. On 2021-06-07 13:46, Roman Mamedov wrote: So this same host that just generated the 1574-byte encapsulated VXLAN packet with something it received via its eth0 port, now needs to send it further to its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it could, that would mean a much worse overhead ratio than currently.
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
On Mon, 7 Jun 2021 13:27:10 +0200 "Jason A. Donenfeld" wrote: > Can you walk me through your use case a bit more, so I can wrap my mind > around the requirements? > > ingress --plain--> wireguard --wireguard[plain]--> vxlan > --vxlan[wireguard[plain]]--> egress Not sure I understand your scheme correctly. In any case, the path of a packet would be... On peer 1: * plain Ethernet -> wrapped into VXLAN -> encrypted into WireGuard On peer 2: * decrypted from WireGuard -> unwrapped from VXLAN -> plain Ethernet > So my question is, why can't you set wireguard's MTU to 80 bytes less > than vxlan's MTU? What's preventing that or making it infeasible? To transparently bridge two Ethernet LANs, a VXLAN interface needs to join an L2 bridge. All interfaces that are members of a bridge must have the same MTU. As such, br0 members on both sides: eth0 (MTU 1500) vx0 (MTU 1500) VXLAN transports full L2 frames encapsulating them into UDP. To fit the full 1500-byte packet and accounting for VXLAN and related IP overheads, the resulting packet size is 1574 bytes. So this same host that just generated the 1574-byte encapsulated VXLAN packet with something it received via its eth0 port, now needs to send it further to its WG peer(s). For this to succeed, the in-tunnel WG MTU needs to be 1574 or more, not 1412 or 1420, as VXLAN itself can't be fragmented[1]; or even if it could, that would mean a much worse overhead ratio than currently. [1] https://datatracker.ietf.org/doc/html/rfc7348#section-4.3 -- With respect, Roman
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
Hi Roman, On Mon, Jun 7, 2021 at 1:13 PM Roman Mamedov wrote: > In the L2 tunneling scenario the large VXLAN packets are generated locally, as > it will be common for the same host (aka "the router") to be both a WG peer > and a VXLAN VTEP, so it is going to be affected. Can you walk me through your use case a bit more, so I can wrap my mind around the requirements? ingress --plain--> wireguard --wireguard[plain]--> vxlan --vxlan[wireguard[plain]]--> egress So my question is, why can't you set wireguard's MTU to 80 bytes less than vxlan's MTU? What's preventing that or making it infeasible? Jason
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
Hey Jason, Jason A. Donenfeld writes: > Hey folks, > > There seems to be a bit of confusion about *which* stage of > fragmentation would be affected by the proposal, so I drew some > diagrams to help illustrate what I'm talking about. Please take a > look: > > https://data.zx2c4.com/potential-wg-fragmentation-proposal.png I love the math: 2792 = 1420 + 1420 = 1500 + 1500 Joke aside, ... > 1) Ingress fragmentation would not be affected by this and is not > relevant for this discussion. This is the case in which a computer > gets a packet for forwarding out of the wireguard interface, and it's > larger than the interface's mtu, so the computer fragments it before > passing it onto that interface. I'm not suggesting any change in this > behavior. I believe this is something wireguard cannot influence *anyway* as the sending side can send any sized packet towards us. > 2) Local egress fragmentation WOULD be affected by this and is the > most relevant thing in this discussion. In this case, a packet that > gets encrypted and winds up being larger than the mtu of the interface > that the encrypted packet will go out of gets fragmented. In this > case, we could likely respond with an ICMP packet or similar in-path > error. But keep in mind this whole situation is local: it usually will > only happen out of misconfiguration. The best fix for the diagram I > drew would be for the administrator to decrease the MTU of the > wireguard interface to 1412. So how does that behave in the situation that the upstream interface or routes change? Let's say WG MTU = 1412, original PMTU = 1500, decreases to 1420. Would that reduce the WG mtu automatically to 1332? I guess not. So what happens with packets arrive with size = 1420? > 3) Path egress fragmentation COULD be affected by this, but doesn't > have to be. In this case, we simply set "don't fragment" on encrypted > egress packets, which means they won't be fragmented by other > computers along the path. That's true, but then it would be required to fragment them locally, wouldn't it? I'm trying to wrap my head around this in comparison to IPv6/IPv4: In the IPv6 world we don't have fragmentation on the path, it's always client based. In the IPv4 world routers can dis/re-assemble packets on the way. If I understand it correctly, you are somewhat suggestion that wireguard behaves a bit like an IPv6 router, albeit for both the v6 and the v4 world. Is that comparison making sense somehow? I think it would be easier to understand, if there was a demo case, a sample tunnel that rejects packets, if fragmentation is needed. What would be the appropriate ICMP message for an IPv4 packet that does not include the DF bit? So far, I'm not fully convinced the approach is a smart way, especially not when it comes to handling network debugging and given that we do already have a TTL that should be a loop prevention as well. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
On Mon, 7 Jun 2021 11:34:21 +0200 "Jason A. Donenfeld" wrote: > 2) Local egress fragmentation WOULD be affected by this and is the > most relevant thing in this discussion. In this case, a packet that > gets encrypted and winds up being larger than the mtu of the interface > that the encrypted packet will go out of gets fragmented. In this > case, we could likely respond with an ICMP packet or similar in-path > error. But keep in mind this whole situation is local: it usually will > only happen out of misconfiguration. The best fix for the diagram I > drew would be for the administrator to decrease the MTU of the > wireguard interface to 1412. In the L2 tunneling scenario the large VXLAN packets are generated locally, as it will be common for the same host (aka "the router") to be both a WG peer and a VXLAN VTEP, so it is going to be affected. > So, of those concerned about this, which concerns are actually about > (2) and (3)? Of those, which ones are about (2)? If you have concerns > specifically about (2) that couldn't be fixed with reasonable system > administration, I'd like to hear why and what the setup is that leads > to that situation. My described case is being able to transparently bridge two Ethernet LANs. Hopefully the answer isn't "you don't really need to do that" or "apply reasonable system administration and set up routing instead". > As an aside, Roman asked about TTL. When tunneling, the outer packet > header always must take the new TTL of the route to the tunnel > endpoint, and not do anything with the potentially much smaller inner > TTL. As far as I can see the inner TTL is not smaller than usual on WG tunnels (64). You could inherit it to the outside of the tunnel, like GRE does: https://serverfault.com/questions/827239/gre-tunnel-ttl-number But of course that's leaking a tiny bit of information about the encrypted tunnel, dunno how critical that would be. -- With respect, Roman
Re: Certain private keys being mangled by wg on FreeBSD
Ah that makes sense. I spent some quality time playing with the bit arithmetic and I see what you mean now. Thanks for that snippet and direction. One byproduct of this exercise was some code that I whipped up that can at least detect a clamped vs unclamped key. This might prove useful for informing a user of what is going on and thus eliminating this class of erroneous bug report entirely. I'm really not sure hiding the private keys entirely in the UI is the right thing to do, especially if it seems that key generators should really be pre-clamping keys (thanks for reaching out to Mullvad btw) and not dishing out unclamped keys to begin with. On Sun, Jun 6, 2021 at 12:21 PM Jason A. Donenfeld wrote: > > On 6/6/21, Christian McDonald wrote: > > Would it not be better for wg to just fail outright instead of > > transforming a poorly generated key entered by a user, regardless of > > where the key came from? Especially if that problematic key passes the > > regex validation that was provided in another thread in this email > > list? > > No, it would not be better. There is nothing wrong with using those > keys. They're not "poorly generated" or "problematic" or dangerous in > the least. This is only a concern with your UI. > > The kernel is doing the correct thing -- clamping keys -- and > displaying an unambiguous identifier to the user: the key that it will > actually be using. > > I suspect the best thing to do for your UI would be to hide private > (and preshared) keys, and only show public keys, unless explicitly > exported into a config file. This not only reduces potential confusion > with this issue, but mitigates another potential footgun down the > line. It's also what wg(8)'s show command does by default (while > showconf will export all). -- R. Christian McDonald M: (616) 856-9291 E: rcmcdonal...@gmail.com
Re: potentially disallowing IP fragmentation on wg packets, and handling routing loops better
Hey folks, There seems to be a bit of confusion about *which* stage of fragmentation would be affected by the proposal, so I drew some diagrams to help illustrate what I'm talking about. Please take a look: https://data.zx2c4.com/potential-wg-fragmentation-proposal.png 1) Ingress fragmentation would not be affected by this and is not relevant for this discussion. This is the case in which a computer gets a packet for forwarding out of the wireguard interface, and it's larger than the interface's mtu, so the computer fragments it before passing it onto that interface. I'm not suggesting any change in this behavior. 2) Local egress fragmentation WOULD be affected by this and is the most relevant thing in this discussion. In this case, a packet that gets encrypted and winds up being larger than the mtu of the interface that the encrypted packet will go out of gets fragmented. In this case, we could likely respond with an ICMP packet or similar in-path error. But keep in mind this whole situation is local: it usually will only happen out of misconfiguration. The best fix for the diagram I drew would be for the administrator to decrease the MTU of the wireguard interface to 1412. 3) Path egress fragmentation COULD be affected by this, but doesn't have to be. In this case, we simply set "don't fragment" on encrypted egress packets, which means they won't be fragmented by other computers along the path. So, of those concerned about this, which concerns are actually about (2) and (3)? Of those, which ones are about (2)? If you have concerns specifically about (2) that couldn't be fixed with reasonable system administration, I'd like to hear why and what the setup is that leads to that situation. As an aside, Roman asked about TTL. When tunneling, the outer packet header always must take the new TTL of the route to the tunnel endpoint, and not do anything with the potentially much smaller inner TTL. So with tunneling, you can't quite rely on the TTL to drop to zero as you'd wish. Hence, I'm interested in using the natural packet size expansion instead. Thanks for the discussion so far. I'm very interested to read clarifying points about applicability to case (2) (and to a lesser extent, about case (3)). Thanks, Jason