Paul, thanks for sharing the state machine. Now I understnad what is going on. I see in logs following sequence:
00:12:59 no rekeying on traffic selector override connection 00:17:29 deleting state (STATE_QUICK_R2) aged 3600.122s and sending notification 00:17:29 ESP traffic information in=513KB out=745KB 00:17:44 terminating SAa using this connection 00:17:44 deleting state (STATE_MAIN_R3) aged 7228.122s and sending notification 00:17:44 added connection description "xxxx" 00:17:44 listening for IKE messages 00:17:44 forgetting secrets The VPN is established between Oracle OCI and Azure. I'm not sure which technology is ued by Oracle OCI, but it look like Libreswan. I received information from partner operating Azure that they have bug in SA lifetime configuration for IKEv1; provided setting is ignored and they always use value of 27000 s. It's not compatible with OCI side configuration of 3600 s. OCI side expects to renegotiate phase 2 after 1h, waits 15 seconds for this to happen (17:29 - 17:44) and gives up. Whole connection is destroyed and recreated. It's noticed by Azure. Finally as Azure as the Initiator (OCI is the responder what is even visible in states *_R*) tries to recreate the VPN. It's done with success after 5 minutes. Mystery solved. 15 seconds timeout may be custom tuning at OCI implementation. Thanks, Ryszard > On 9 Apr 2021, at 03:22, Paul Wouters <[email protected]> wrote: > > On Thu, 8 Apr 2021, Ryszard Styczynski wrote: > >> I'm looking for IPsec state machine implemented in Libreswan. I may guess >> how states are correlated, but having a state machine will give me a final >> answer. > > For IKEv1, the state machine is in programs/pluto/ikev1.c > >> My current question is what is a next state after STATE_QUICK_R2? Should >> IPsec engine wait for rekeying? How long? How many times should repeat >> waiting step? Should go back to STATE_MAIN and delete SA? When? >> >> I currently see i my system that: >> 1. STATE_QUICK_R2 may go to STATE_MAIN_R3, delete SA, and reestablish >> connection from Phase 1 - it happens after 15 seconds >> 2. STATE_QUICK_R2 may go to STATE_QUICK_R1 and process rekeying - it happens >> when peer responds quicker than 15 seconds >> >> How to understand why sometimes SA is deleted (what causes 5 minutes line >> drop), and sometimes rekeying is completed? How to control time limits? > > A proper exchange looks like: > > paul@thinkpad:~/libreswan.git/testing/pluto/basic-pluto-01 (main=)$ grep > STATE_ OUTPUT/east.pluto.log |grep transition > | IKEv1: transition from state STATE_MAIN_R0 to state STATE_MAIN_R1 > | IKEv1: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2 > | IKEv1: transition from state STATE_MAIN_R2 to state STATE_MAIN_R3 > | IKEv1: transition from state STATE_QUICK_R0 to state STATE_QUICK_R1 > | IKEv1: transition from state STATE_QUICK_R1 to state STATE_QUICK_R2 > > Nothing should really happen after 15 seconds, so perhaps you should > show us your logs to see what is happening? > > Paul
_______________________________________________ Swan mailing list [email protected] https://lists.libreswan.org/mailman/listinfo/swan
