On 27/12/19 15:04 +0100, Valentin Vidić wrote: > On Wed, Dec 04, 2019 at 02:44:49PM +0100, Jan Pokorný wrote: >> For the record, based on my feedback, iptables-extensions man page is >> headed to (finally) align with the actual in-kernel deprecation >> message: >> https://lore.kernel.org/netfilter-devel/20191204130921.2914-1-p...@nwl.cc/ > > From a quick run of xt_cluster it seems to be working as expected > for IPv4
FTR. when having "netfilter"/nftables backend available, you can either make use of iptables-translate conversion utility, or deduce a similar takeaway from https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.txlate?h=v1.8.4 possibly allowing to ditch any dependency on iptables-* tooling, and on xt_cluster.ko just as well! As mentioned in a newer incarnation asking about xt_cluster: https://lists.clusterlabs.org/pipermail/users/2020-January/026718.html for the envisioned agent, it would be a way to (optionally) allow for a rather lightweight operation in the future (where iptables may not get installed by default with some Linux distros at all; well, even firewalld-as-a-middleware variant controlled just via DBus calls might be thinkable, meaning that "nft" tool wouldn't be required, too). > It requires iptables rules and ARP reply rewrite like: > > arptables -A OUTPUT -o eth1 --h-length 6 -j mangle --mangle-mac-s > 01:00:5e:00:01:01 pardon my ignorance but you currently appear to be the greatest expert with practical experience on this list regarding the topic. * * * 1. Is this based solely on experience with xt_cluster extension that led you to this ARP-level rewrite unique to using netfilter backend, or would the same actually be needed with true CLUSTERIP target? Actually, I took a look at the code of CLUSTERIP extension, and it in fact is used to do the very same ARP level mangling, even though, it is slightly more precise, akin to (with stray in-line comments): arptables -A OUTPUT \ --h-type 1 \ # Ethernet --proto-type 0x800 \ # IPv4 --h-length 6 \ # perhaps redundant to --h-type? \ # cannot express limitation on the size of network address \ # but that would perhaps be redundant to --proto-type --opcode 2 \ # this time for Reply -j mangle --mangle-mac-s CLUSTERMAC # see also # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4095ebf1e641b0f37ee1cd04c903bb85cf4ed25b arptables -A OUTPUT \ --h-type 1 \ # Ethernet --proto-type 0x800 \ # IPv4 --h-length 6 \ # perhaps redundant to --h-type? \ # cannot express limitation on the size of network address \ # but that would perhaps be redundant to --proto-type --opcode 1 \ # this time for Request -j mangle --mangle-mac-s CLUSTERMAC What you've used appears to be akin to what this chunk of manpage suggests (amongst others): https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.man which is (yet another) indicator to me that xt_cluster extension doesn't carry that functionality on its own (like CLUSTERIP target did, as mentioned). * * Anyway, I'd like to understand why is this necessary in the first * place, getting to my second question. * 2. Is the following, for me viable explanation correct? That arrangement is to prevent here unexpectedly leaky specific associations (I'd call "fixations") of the interface's true (hence non-multicast) MAC address with meant-to-be-shared IP address at hand, and hence cancelling the effect of link-multicasted frames (to which at most a single recipient would respond per the firewall matching rules), and therefore botching the "shared IP" concept altogether from the perspective of network members that would undesirably learn non-multicast address association for the particular meant-to-be-shared IP leaked like this. * * But it doesn't explain the suggested destination MAC renormalization * on INPUT, which is currently yet to be heard of for our purpose... * 3. Is, perhaps, the following plausible explanations sound? - this is so as not spoil the local ARP cache/dependent interactions, such as when actual ARP request is sent with link layer address identical with the particular multicast MAC in use -- likely feasible with the other host on the network configured the same way and with already mentioned source-MAC rewriting on egress in place (so this would actually neutralize harmful network effect it could otherwise be causing) - or any other reasons? * * Finally, when referring to the suggestive example above, there is * one more question to ask... * 4. Shall not even existing IPaddr2 (whether in CLUSTERIP-based mode or not) actually verify that /proc/sys/net/netfilter/nf_conntrack_tcp_loose gets cleared, at least until told not to through configuration? - looks like a good idea not to allow any after-cut packets interaction (would only apply to anything outside of the critical cluster infrastructure since it uses UDP), as a matter of safety precautions (there are no liveness aspects to wish for in such scenarios, which could otherwise interfere, I think) > However for IPv6 I could not find an equivalent command to rewrite > Neighbour Advertisment packets. Does anyone have an idea how this > could be done? 5. Here, I had a closer look at the code as well and have an option to try -- does this help? It appears as if that response in the (solicited) Neighbour Advertisement is -- in Linux kernel -- unconditionally always picked from the very first address configured on the device (not to be confused with "permanent address"). Hence it looks to me that the way to go would be, so as to achieve feature parity IPv4 vs. IPv6, to either: - give up on the sole identity of the interface, so that it either operates under selected multicast link layer address or doesn't operate at all (rationale: better not to confuse the network with occasional MAC flips?) - stick with a new macvlan pseudointerface, surprise-surprise, yet another virtualization/mimicking/independence-increasing layer :-) No experience with macvlan on my side, but bridge mode looks appealing, and would retain the interface addressable through its standard MAC address as well. And importantly, the newly created interface would have the correct (multicast) MAC address to respond with to the respective Neighbour Solicitations (which is exactly what's asked, IIUIC), and I expect it would be the one selected to respond to the very matching IP in question? Still, this doesn't resolve any concern around point 3. above (assuming it's not bogus, to begin with). * * * Sorry for any impreciseness, it's all quite confusing to me, and the WHYs are rather underdocumented/inaccessible for my taste. But hopefully, we can put some knowledge and practice together. -- Jan (Poki)
pgphasKwRgUCl.pgp
Description: PGP signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/