Hi Tobias and list,

I was able to duplicate my issues around NAT remapping in Azure with StrongSwan 
at both ends, so now Racoon is completely out of the picture.

I captured some more detailed logs of the situation. In the logs below, 
stg-vault-zk04 is in AWS and has static NAT in front of it (i.e. 500->500, 
4500->4500); stg-vault-zk06 (second log) is in Azure, with a less-static NAT 
service in front of it (it passes 500 and 4500 in to 500 and 4500 on the host, 
but outgoing traffic on 500/4500 will sometimes be remapped to ports around 
1024).

The problem is that sometimes ESP traffic is sent to the IKE port (and possibly 
vice versa). I’ve ^marked a few places in the logs where things happen that I 
think are relevant.

One challenge is that we can’t force the remapping to happen, so we’re at the 
whim of Azure for trying to reproduce this.


Config on stg-vault-zk06: 

connections {
stg-vault-zk04_0 {
  include swanctl-ike.conf
  local_addrs = 10.0.0.59
  remote_addrs = 54.186.244.242
  children {
    stg-vault-zk04_0 {
      include swanctl-ipsec.conf
    }
  }
}
}

On stg-vault-zk04:

connections {
stg-vault-zk06_0 {
  include swanctl-ike.conf
  local_addrs = 172.17.128.86
  remote_addrs = 13.88.23.150
  children {
    stg-vault-zk06_0 {
      include swanctl-ipsec.conf
    }
  }
}
}

swanctl-ike.conf contains this on both:

version = 1
local { auth = psk }
remote { auth = psk }
proposals = aes128-sha256-modp3072
encap = yes
dpd_delay = 20
# This avoids interop issue between strongswan and racoon involving
# fragmented IKE messages immediately after racoon restart. Once our whole
# fleet is StrongSwan we may wish to return this to the default of "yes"
# (or not, since our IKE messages are typically small enough to not require
# fragmentation). See SRE-1203.
fragmentation = no
reauth_time = 23h
over_time = 1h
rand_time = 1h

swanctl-ipsec.conf contains this on both:

mode = transport
start_action = trap
esp_proposals = aes128-sha1-modp3072
life_time = 8h
rekey_time = 7h
rand_time = 1h

As for PSKs: I’m confident that the PSKs are configured correctly because these 
hosts communicate successfully when port remapping doesn’t get in the way.

After running “swanctl -i --child stg-vault-zk06” on stg-vault-zk04: 

Mar 27 15:47:35 stg-vault-zk04 charon: 15[CFG] vici initiate 'stg-vault-zk06_0'
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
ISAKMP_VENDOR task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
ISAKMP_CERT_PRE task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
MAIN_MODE task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
ISAKMP_CERT_POST task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
ISAKMP_NATD task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> queueing 
QUICK_MODE task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> 
activating new tasks
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318>   
activating ISAKMP_VENDOR task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318>   
activating ISAKMP_CERT_PRE task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318>   
activating MAIN_MODE task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318>   
activating ISAKMP_CERT_POST task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318>   
activating ISAKMP_NATD task
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending 
XAuth vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending 
DPD vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending 
NAT-T (RFC 3947) vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> sending 
draft-ietf-ipsec-nat-t-ike-02\n vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> 
initiating Main Mode IKE_SA stg-vault-zk06_0[318] to 13.88.23.150
Mar 27 15:47:35 stg-vault-zk04 charon: 14[IKE] <stg-vault-zk06_0|318> IKE_SA 
stg-vault-zk06_0[318] state change: CREATED => CONNECTING
Mar 27 15:47:35 stg-vault-zk04 charon: 14[CFG] <stg-vault-zk06_0|318> 
configured proposals: 
IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072
Mar 27 15:47:35 stg-vault-zk04 charon: 14[ENC] <stg-vault-zk06_0|318> 
generating ID_PROT request 0 [ SA V V V V ]
Mar 27 15:47:35 stg-vault-zk04 charon: 14[NET] <stg-vault-zk06_0|318> sending 
packet: from 172.17.128.86[500] to 13.88.23.150[500] (160 bytes)
Mar 27 15:47:35 stg-vault-zk04 charon: 07[NET] <stg-vault-zk06_0|318> received 
packet: from 13.88.23.150[1031] to 172.17.128.86[500] (140 bytes)

^ first non-port-500 packet from zk04's perspective. this says to me that nat 
port 1031 is mapped to port 500 on stg-vault-zk06

Mar 27 15:47:35 stg-vault-zk04 charon: 07[ENC] <stg-vault-zk06_0|318> parsed 
ID_PROT response 0 [ SA V V V ]
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received 
XAuth vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received 
DPD vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> received 
NAT-T (RFC 3947) vendor ID
Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> selecting 
proposal:
Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318>   
proposal matches
Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> received 
proposals: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072
Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> 
configured proposals: 
IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072
Mar 27 15:47:35 stg-vault-zk04 charon: 07[CFG] <stg-vault-zk06_0|318> selected 
proposal: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_3072
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318> 
reinitiating already active tasks
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318>   
ISAKMP_VENDOR task
Mar 27 15:47:35 stg-vault-zk04 charon: 07[IKE] <stg-vault-zk06_0|318>   
MAIN_MODE task
Mar 27 15:47:35 stg-vault-zk04 charon: 07[LIB] <stg-vault-zk06_0|318> size of 
DH secret exponent: 3071 bits
Mar 27 15:47:35 stg-vault-zk04 charon: 07[ENC] <stg-vault-zk06_0|318> 
generating ID_PROT request 0 [ KE No NAT-D NAT-D ]
Mar 27 15:47:35 stg-vault-zk04 charon: 07[NET] <stg-vault-zk06_0|318> sending 
packet: from 172.17.128.86[500] to 13.88.23.150[1031] (524 bytes)
Mar 27 15:47:35 stg-vault-zk04 charon: 16[NET] <stg-vault-zk06_0|318> received 
packet: from 13.88.23.150[1031] to 172.17.128.86[500] (524 bytes)
Mar 27 15:47:35 stg-vault-zk04 charon: 16[ENC] <stg-vault-zk06_0|318> parsed 
ID_PROT response 0 [ KE No NAT-D NAT-D ]
Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> local 
host is behind NAT, sending keep alives
Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> remote 
host is behind NAT
Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318> 
reinitiating already active tasks
Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318>   
ISAKMP_VENDOR task
Mar 27 15:47:35 stg-vault-zk04 charon: 16[IKE] <stg-vault-zk06_0|318>   
MAIN_MODE task
Mar 27 15:47:35 stg-vault-zk04 charon: 16[ENC] <stg-vault-zk06_0|318> 
generating ID_PROT request 0 [ ID HASH ]
Mar 27 15:47:35 stg-vault-zk04 charon: 16[NET] <stg-vault-zk06_0|318> sending 
packet: from 172.17.128.86[4500] to 13.88.23.150[1031] (92 bytes)

^ we shouldn't be seeing traffic from 4500 to (1030->500)??? 




logs from stg-vault-zk06:


Mar 27 15:47:35 stg-vault-zk06 charon: 05[NET] <80> received packet: from 
54.186.244.242[500] to 10.0.0.59[500] (160 bytes)
Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80> looking for an ike config 
for 10.0.0.59...54.186.244.242
Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80>   candidate: 
10.0.0.59...54.186.244.242, prio 3100
Mar 27 15:47:35 stg-vault-zk06 charon: 05[CFG] <80> found matching ike config: 
10.0.0.59...54.186.244.242 with prio 3100
Mar 27 15:47:35 stg-vault-zk06 charon: 05[IKE] <80> 54.186.244.242 is 
initiating a Main Mode IKE_SA
Mar 27 15:47:35 stg-vault-zk06 charon: 05[NET] <80> sending packet: from 
10.0.0.59[500] to 54.186.244.242[500] (140 bytes)
Mar 27 15:47:35 stg-vault-zk06 charon: 10[NET] <80> received packet: from 
54.186.244.242[500] to 10.0.0.59[500] (524 bytes)
Mar 27 15:47:35 stg-vault-zk06 charon: 10[CFG] <80>   candidate 
"stg-vault-zk04_0", match: 1/1/3100 (me/other/ike)
Mar 27 15:47:35 stg-vault-zk06 charon: 10[IKE] <80> no shared key found for 
'10.0.0.59'[10.0.0.59] - '(null)'[54.186.244.242]

^ I don't understand why this is '(null)'

Mar 27 15:47:35 stg-vault-zk06 charon: 10[NET] <80> sending packet: from 
10.0.0.59[500] to 54.186.244.242[500] (524 bytes)
Mar 27 15:47:35 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes) 

^ I don't know why this packet went from 500 to 4500

Mar 27 15:47:35 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:47:35 stg-vault-zk06 charon: 15[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]

^ at this point I believe we're getting ESP-in-UDP packets to 1031->500.

Mar 27 15:47:39 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes)
Mar 27 15:47:39 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:47:39 stg-vault-zk06 charon: 10[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]
Mar 27 15:47:46 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes)
Mar 27 15:47:46 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:47:46 stg-vault-zk06 charon: 13[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]
Mar 27 15:47:55 stg-vault-zk06 charon: 16[IKE] <80> sending keep alive to 
54.186.244.242[500]
Mar 27 15:47:59 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes)
Mar 27 15:47:59 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:47:59 stg-vault-zk06 charon: 07[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]
Mar 27 15:48:05 stg-vault-zk06 charon: 14[JOB] <80> deleting half open IKE_SA 
with 54.186.244.242 after timeout
Mar 27 15:48:23 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes)
Mar 27 15:48:23 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:48:23 stg-vault-zk06 charon: 08[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]
Mar 27 15:49:05 stg-vault-zk06 charon: 03[NET] sending packet: from 
10.0.0.59[500] to 54.186.244.242[4500] (36 bytes)
Mar 27 15:49:05 stg-vault-zk06 charon: 03[NET] received unsupported IKE version 
8.14 from 54.186.244.242, sending INVALID_MAJOR_VERSION
Mar 27 15:49:05 stg-vault-zk06 charon: 12[NET] received unencrypted 
informational: from 54.186.244.242[4500] to 10.0.0.59[500]




> On Mar 23, 2018, at 1:20 PM, Tobias Brunner <tob...@strongswan.org> wrote:
> 
> Hi Rich,
> 
>> 1. IKE and ESP SAs are established normally with NAT-T, i.e. 500:4500.
>> 2. NAT remapping occurs within Azure, at which point StrongSwan sees IKE 
>> packets come from port 1027 instead of 500. (i.e. instead of 500:500 it’s 
>> 500:1027).
> 
> And what happens to port 4500?  Why would there even be further messages
> from port 500 for existing IKE_SAs?
> 
> If a NAT is detected when the SA is established (even if the port is the
> same, the IP changes so a NAT should be detected) racoon should switch
> to port 4500 with the third Main Mode message and then keep sending
> packets from/to that port (there is no reason to send messages from port
> 500 later, unless a new SA is initiated).
> 
>> 3. StrongSwan observes that the source port for IKE has changed to 1027, and 
>> starts sending both IKE and ESP-in-UDP packets to azure:1027.
> 
> Are these DPD messages?  Or what IKE messages trigger this?  Or is this
> actually triggered by ESP-in-UDP messages via a kernel event?  (There is
> one if the kernel notices different UDP ports than configured on the
> installed IPsec SA.)
> 
>> First, am I correctly understanding the behaviour of StrongSwan when NAT 
>> remapping occurs? Is this expected StrongSwan behaviour?
> 
> Check the log for what's actually going on.
> 
>> And: If the Azure end was StrongSwan, and ESP-in-UDP packets started 
>> arriving at port 500, would StrongSwan process them as ESP?
> 
> No, it's not possible to process ESP-in-UDP packets on port 500 as IKE
> messages to that port are not sent with non-ESP marker and thus would
> get processed as ESP messages.
> 
> Regards,
> Tobias

Reply via email to