Hey Tobias,

Thanks for your quick response.

> On Mar 23, 2018, at 1:20 PM, Tobias Brunner <tob...@strongswan.org> wrote:
> 
> Hi Rich,
> 
>> 1. IKE and ESP SAs are established normally with NAT-T, i.e. 500:4500.
>> 2. NAT remapping occurs within Azure, at which point StrongSwan sees IKE 
>> packets come from port 1027 instead of 500. (i.e. instead of 500:500 it’s 
>> 500:1027).
> 
> And what happens to port 4500?  Why would there even be further messages
> from port 500 for existing IKE_SAs?

Unfortunately Azure doesn’t let you inspect the NAT table so I can’t see what’s 
going on there, but we do have static inbound (pass-through) NAT for 500 and 
4500. 

The messages are indeed DPDs.

StrongSwan says:

Mar 23 18:42:15 aws-host charon-systemd: 14[NET] <azure-host|11> received 
packet: from 104.x.x.x[1987] to 172.17.128.205[4500] (108 bytes)
Mar 23 18:42:15 aws-host charon-systemd: 14[ENC] <azure-host|11> parsed 
INFORMATIONAL_V1 request 3041276986 [ HASH N(DPD) ]
Mar 23 18:42:15 aws-host charon-systemd: 14[ENC] <azure-host|11> generating 
INFORMATIONAL_V1 request 1272875000 [ HASH N(DPD_ACK) ]
Mar 23 18:42:15 aws-host charon-systemd: 14[NET] <azure-host|11> sending 
packet: from 172.17.128.205[4500] to 104.x.x.x[1987] (108 bytes)

Traffic from the Racoon host looks like:

18:42:15.587263 IP (tos 0x0, ttl 64, id 35423, offset 0, flags [DF], proto UDP 
(17), length 140)
    10.0.0.5.500 > 34.x.x.x.4500: isakmp 0.15 msgid 08100501:
18:42:15.610564 IP (tos 0x0, ttl 37, id 15016, offset 0, flags [DF], proto UDP 
(17), length 140)
    34.x.x.x.4500 > 10.0.0.5.500: isakmp 0.15 msgid 08100501:

I don’t understand why they’re originating at port 500 destined for 4500, I’d 
expect DPDs to be 4500-4500. And looking at an affected host, I now see two IKE 
SAs established, one of which is using the right ports for DPDs! I realize at 
this point that I’m troubleshooting Racoon on a StrongSwan mailing list, alas. 
I just want to get rid of Racoon! :-)

I suspect I’ve reached a dead end without being able to understand what Racoon 
is doing. Any insights you might have are appreciated.

> If a NAT is detected when the SA is established (even if the port is the
> same, the IP changes so a NAT should be detected) racoon should switch
> to port 4500 with the third Main Mode message and then keep sending
> packets from/to that port (there is no reason to send messages from port
> 500 later, unless a new SA is initiated).

I did find the initiation of the SA which is weird-looking on the racoon end — 
now I’m doubting that this NAT remapping happened while an SA was established, 
but it being a black box I’m not sure. 

Mar 22 19:41:33 azure-host racoon: INFO: IPsec-SA request for 34.x.x.x queued 
due to no phase1 found.
Mar 22 19:41:33 azure-host racoon: INFO: initiate new phase 1 negotiation: 
10.0.0.5[500]<=>34.x.x.x[500]
Mar 22 19:41:34 azure-host racoon: INFO: respond new phase 1 negotiation: 
10.0.0.5[500]<=>34.x.x.x[500]
Mar 22 19:41:34 azure-host racoon: [34.210.18.127] INFO: Selected NAT-T 
version: RFC 3947
Mar 22 19:41:34 azure-host racoon: [34.210.18.127] INFO: Hashing 34.x.x.x[500] 
with algo #4
Mar 22 19:41:34 azure-host racoon: [34.210.18.127] INFO: Hashing 34.x.x.x[500] 
with algo #4

Mar 22 19:41:34 azure-host racoon: INFO: NAT-T: ports changed to: 
34.x.x.x[4500]<->10.0.0.5[500]              <— this is weird

Mar 22 19:41:34 azure-host racoon: INFO: KA list add: 
10.0.0.5[500]->34.x.x.x[4500]
Mar 22 19:41:34 azure-host racoon: INFO: ISAKMP-SA established 
10.0.0.5[500]-34.x.x.x[4500] spi:2f4d3bcf1b59f765:c921f7666b0f8a09

(Do you find that highlighted line as weird as I do? I’d expect both ends to 
change to 4500, and that’s what other successful SAs look like from the Racoon 
side.)

The same from the Swan end — unfortunately we were running with pretty low 
logging there since with 100+ SAs per host, verbose NET or ENC logging gets big 
quickly.

Mar 22 19:41:34 aws-host charon-systemd: 05[KNL] creating acquire job for 
policy 172.17.128.205/32[tcp/51069] === 104.x.x.x/32[tcp/9092] with reqid {102}
Mar 22 19:41:34 aws-host charon-systemd: 06[IKE] <azure-host|11> initiating 
Main Mode IKE_SA azure-host[11] to 104.x.x.x
Mar 22 19:41:34 aws-host charon-systemd: 10[IKE] <azure-host|11> received XAuth 
vendor ID
Mar 22 19:41:34 aws-host charon-systemd: 10[IKE] <azure-host|11> received NAT-T 
(RFC 3947) vendor ID
Mar 22 19:41:34 aws-host charon-systemd: 10[IKE] <azure-host|11> received DPD 
vendor ID
Mar 22 19:41:34 aws-host charon-systemd: 10[IKE] <azure-host|11> remote host is 
behind NAT
Mar 22 19:41:34 aws-host charon-systemd: 07[IKE] <azure-host|11> IKE_SA 
azure-host[11] established between 
172.17.128.205[172.17.128.205]…104.x.x.x[10.0.0.5]
Mar 22 19:41:34 aws-host charon-systemd: 07[IKE] <azure-host|11> scheduling 
reauthentication in 80303s
Mar 22 19:41:34 aws-host charon-systemd: 07[IKE] <azure-host|11> maximum IKE_SA 
lifetime 83903s
Mar 22 19:41:34 aws-host charon-systemd: 12[IKE] <azure-host|11> CHILD_SA 
azure-host{513} established with SPIs ccf233d3_i 0ca019ad_o and TS 
172.17.128.205/32 === 104.x.x.x/32
Mar 22 19:41:35 aws-host charon-systemd: 13[IKE] <azure-host|11> detected 
rekeying of CHILD_SA azure-host{513}
Mar 22 19:41:35 aws-host charon-systemd: 16[IKE] <azure-host|11> CHILD_SA 
azure-host{973} established with SPIs c33a54cb_i 0c47c890_o and TS 
172.17.128.205/32 === 104.x.x.x/32


Thanks,
  -Rich

Reply via email to