Hi Tobias and list, I was able to duplicate my issues around NAT remapping in Azure with StrongSwan at both ends, so now Racoon is completely out of the picture.
I captured some more detailed logs of the situation. In the logs below, stg-vault-zk04 is in AWS and has static NAT in front of it (i.e. 500->500, 4500->4500); stg-vault-zk06 (second log) is in Azure, with a less-static NAT service in front of it (it passes 500 and 4500 in to 500 and 4500 on the host, but outgoing traffic on 500/4500 will sometimes be remapped to ports around 1024). The problem is that sometimes ESP traffic is sent to the IKE port (and possibly vice versa). I’ve ^marked a few places in the logs where things happen that I think are relevant. One challenge is that we can’t force the remapping to happen, so we’re at the whim of Azure for trying to reproduce this. Config on stg-vault-zk06: connections { stg-vault-zk04_0 { include swanctl-ike.conf local_addrs = 10.0.0.59 remote_addrs = 54.186.244.242 children { stg-vault-zk04_0 { include swanctl-ipsec.conf } } } } On stg-vault-zk04: connections { stg-vault-zk06_0 { include swanctl-ike.conf local_addrs = 172.17.128.86 remote_addrs = 13.88.23.150 children { stg-vault-zk06_0 { include swanctl-ipsec.conf } } } } swanctl-ike.conf contains this on both: version = 1 local { auth = psk } remote { auth = psk } proposals = aes128-sha256-modp3072 encap = yes dpd_delay = 20 # This avoids interop issue between strongswan and racoon involving # fragmented IKE messages immediately after racoon restart. Once our whole # fleet is StrongSwan we may wish to return this to the default of "yes" # (or not, since our IKE messages are typically small enough to not require # fragmentation). See SRE-1203. fragmentation = no reauth_time = 23h over_time = 1h rand_time = 1h swanctl-ipsec.conf contains this on both: mode = transport start_action = trap esp_proposals = aes128-sha1-modp3072 life_time = 8h rekey_time = 7h rand_time = 1h As for PSKs: I’m confident that the PSKs are configured correctly because these hosts communicate successfully when port remapping doesn’t get in the way. After running “swanctl -i --child stg-vault-zk06” on stg-vault-zk04: Mar 27 15:47:35 stg-vault-zk04 charon: 15[CFG] vici initiate 'stg-vault-zk06_0' Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing ISAKMP_VENDOR task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing ISAKMP_CERT_PRE task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing MAIN_MODE task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing ISAKMP_CERT_POST task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing ISAKMP_NATD task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing QUICK_MODE task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating new tasks Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating ISAKMP_VENDOR task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating ISAKMP_CERT_PRE task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating MAIN_MODE task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating ISAKMP_CERT_POST task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> activating ISAKMP_NATD task Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending XAuth vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending DPD vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending NAT-T (RFC 3947) vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending draft-ietf-ipsec-nat-t-ike-02\n vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> initiating Main Mode IKE_SA stg-vault-zk06_0[318] to 13.88.23.150 Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> IKE_SA stg-vault-zk06_0[318] state change: CREATED => CONNECTING Mar 27 15:47:35 stg-vault-zk04 charon: 14[CFG] <stg-vault-zk06_0|318> configured proposals: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072 Mar 27 15:47:35 stg-vault-zk04 charon: 14[ENC] <stg-vault-zk06_0|318> generating ID_PROT request 0 [ SA V V V V ] Mar 27 15:47:35 stg-vault-zk04 charon: 14[NET] <stg-vault-zk06_0|318> sending packet: from 172.17.128.86[500] to 13.88.23.150[500] (160 bytes) Mar 27 15:47:35 stg-vault-zk04 charon: 07[NET] <stg-vault-zk06_0|318> received packet: from 13.88.23.150[1031] to 172.17.128.86[500] (140 bytes) ^ first non-port-500 packet from zk04's perspective. this says to me that nat port 1031 is mapped to port 500 on stg-vault-zk06 Mar 27 15:47:35 stg-vault-zk04 charon: 07[ENC] <stg-vault-zk06_0|318> parsed ID_PROT response 0 [ SA V V V ] Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received XAuth vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received DPD vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received NAT-T (RFC 3947) vendor ID Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> selecting proposal: Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> proposal matches Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> received proposals: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072 Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> configured proposals: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072 Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> selected proposal: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072 Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> reinitiating already active tasks Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> ISAKMP_VENDOR task Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> MAIN_MODE task Mar 27 15:47:35 stg-vault-zk04 charon: 07[LIB] <stg-vault-zk06_0|318> size of DH secret exponent: 3071 bits Mar 27 15:47:35 stg-vault-zk04 charon: 07[ENC] <stg-vault-zk06_0|318> generating ID_PROT request 0 [ KE No NAT-D NAT-D ] Mar 27 15:47:35 stg-vault-zk04 charon: 07[NET] <stg-vault-zk06_0|318> sending packet: from 172.17.128.86[500] to 13.88.23.150[1031] (524 bytes) Mar 27 15:47:35 stg-vault-zk04 charon: 16[NET] <stg-vault-zk06_0|318> received packet: from 13.88.23.150[1031] to 172.17.128.86[500] (524 bytes) Mar 27 15:47:35 stg-vault-zk04 charon: 16[ENC] <stg-vault-zk06_0|318> parsed ID_PROT response 0 [ KE No NAT-D NAT-D ] Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> local host is behind NAT, sending keep alives Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> remote host is behind NAT Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> reinitiating already active tasks Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> ISAKMP_VENDOR task Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> MAIN_MODE task Mar 27 15:47:35 stg-vault-zk04 charon: 16[ENC] <stg-vault-zk06_0|318> generating ID_PROT request 0 [ ID HASH ] Mar 27 15:47:35 stg-vault-zk04 charon: 16[NET] <stg-vault-zk06_0|318> sending packet: from 172.17.128.86[4500] to 13.88.23.150[1031] (92 bytes) ^ we shouldn't be seeing traffic from 4500 to (1030->500)??? logs from stg-vault-zk06: Mar 27 15:47:35 stg-vault-zk06 charon: 05[NET] <80> received packet: from 54.186.244.242[500] to 10.0.0.59[500] (160 bytes) Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80> looking for an ike config for 10.0.0.59...54.186.244.242 Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80> candidate: 10.0.0.59...54.186.244.242, prio 3100 Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80> found matching ike config: 10.0.0.59...54.186.244.242 with prio 3100 Mar 27 15:47:35 stg-vault-zk06 charon: 05[IKE] <80> 54.186.244.242 is initiating a Main Mode IKE_SA Mar 27 15:47:35 stg-vault-zk06 charon: 05[NET] <80> sending packet: from 10.0.0.59[500] to 54.186.244.242[500] (140 bytes) Mar 27 15:47:35 stg-vault-zk06 charon: 10[NET] <80> received packet: from 54.186.244.242[500] to 10.0.0.59[500] (524 bytes) Mar 27 15:47:35 stg-vault-zk06 charon: 10[CFG] <80> candidate "stg-vault-zk04_0", match: 1/1/3100 (me/other/ike) Mar 27 15:47:35 stg-vault-zk06 charon: 10[IKE] <80> no shared key found for '10.0.0.59'[10.0.0.59] - '(null)'[54.186.244.242] ^ I don't understand why this is '(null)' Mar 27 15:47:35 stg-vault-zk06 charon: 10[NET] <80> sending packet: from 10.0.0.59[500] to 54.186.244.242[500] (524 bytes) Mar 27 15:47:35 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) ^ I don't know why this packet went from 500 to 4500 Mar 27 15:47:35 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:47:35 stg-vault-zk06 charon: 15[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] ^ at this point I believe we're getting ESP-in-UDP packets to 1031->500. Mar 27 15:47:39 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) Mar 27 15:47:39 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:47:39 stg-vault-zk06 charon: 10[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] Mar 27 15:47:46 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) Mar 27 15:47:46 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:47:46 stg-vault-zk06 charon: 13[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] Mar 27 15:47:55 stg-vault-zk06 charon: 16[IKE] <80> sending keep alive to 54.186.244.242[500] Mar 27 15:47:59 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) Mar 27 15:47:59 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:47:59 stg-vault-zk06 charon: 07[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] Mar 27 15:48:05 stg-vault-zk06 charon: 14[JOB] <80> deleting half open IKE_SA with 54.186.244.242 after timeout Mar 27 15:48:23 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) Mar 27 15:48:23 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:48:23 stg-vault-zk06 charon: 08[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] Mar 27 15:49:05 stg-vault-zk06 charon: 03[NET] sending packet: from 10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) Mar 27 15:49:05 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION Mar 27 15:49:05 stg-vault-zk06 charon: 12[NET] received unencrypted informational: from 54.186.244.242[4500] to 10.0.0.59[500] > On Mar 23, 2018, at 1:20 PM, Tobias Brunner <tob...@strongswan.org> wrote: > > Hi Rich, > >> 1. IKE and ESP SAs are established normally with NAT-T, i.e. 500:4500. >> 2. NAT remapping occurs within Azure, at which point StrongSwan sees IKE >> packets come from port 1027 instead of 500. (i.e. instead of 500:500 it’s >> 500:1027). > > And what happens to port 4500? Why would there even be further messages > from port 500 for existing IKE_SAs? > > If a NAT is detected when the SA is established (even if the port is the > same, the IP changes so a NAT should be detected) racoon should switch > to port 4500 with the third Main Mode message and then keep sending > packets from/to that port (there is no reason to send messages from port > 500 later, unless a new SA is initiated). > >> 3. StrongSwan observes that the source port for IKE has changed to 1027, and >> starts sending both IKE and ESP-in-UDP packets to azure:1027. > > Are these DPD messages? Or what IKE messages trigger this? Or is this > actually triggered by ESP-in-UDP messages via a kernel event? (There is > one if the kernel notices different UDP ports than configured on the > installed IPsec SA.) > >> First, am I correctly understanding the behaviour of StrongSwan when NAT >> remapping occurs? Is this expected StrongSwan behaviour? > > Check the log for what's actually going on. > >> And: If the Azure end was StrongSwan, and ESP-in-UDP packets started >> arriving at port 500, would StrongSwan process them as ESP? > > No, it's not possible to process ESP-in-UDP packets on port 500 as IKE > messages to that port are not sent with non-ESP marker and thus would > get processed as ESP messages. > > Regards, > Tobias