Bill Burcham created GEODE-10122:
------------------------------------

             Summary: With TLSv1.3 and GCM-based cipher (the default), P2P 
Messaging Fails When Encrypted Data Limit is Reached
                 Key: GEODE-10122
                 URL: https://issues.apache.org/jira/browse/GEODE-10122
             Project: Geode
          Issue Type: Bug
    Affects Versions: 1.14.3, 1.13.7, 1.15.0, 1.16.0
            Reporter: Bill Burcham
         Attachments: patch-P2PMessagingConcurrencyDUnitTest.txt

TLSv1.3 introduced [1] the ability to set per-algorithm limits on symmetric key 
usage lifetimes. Once a certain number of bytes have been encrypted, a 
KeyUpdate post-handshake message is sent.

With default settings, on Liberica JDK 11, Geode's P2P framework will negotiate 
TLSv1.3 with the TLS_AES_256_GCM_SHA384 cipher suite. Geode P2P messaging will 
eventually fail, with a "Tag mismatch!" IOException in shared ordered 
receivers, after a session has been in heavy use for days.

We have not see this failure on TLSv1.2.

The implementation of TLSv1.3 in the Java runtime provides a security property 
[2] to configure the encrypted data limit. The attached patch to 
P2PMessagingConcurrencyDUnitTest configures the limit large enough that the 
test makes it through the (P2P) TLS handshake but small enough so that the "Tag 
mismatch!" exception is encountered less than a minute later.

The bug is caused by Geode’s NioSslEngine class’ ignorance of the 
“rehandshaking” phase of the TLS protocol [3]:

    Creation - ready to be configured.

    Initial handshaking - perform authentication and negotiate communication 
parameters.

    Application data - ready for application exchange.

    *Rehandshaking* - renegotiate communications parameters/authentication; 
handshaking data may be mixed with application data.

    Closure - ready to shut down connection.

Geode's tcp.Connection and NioSslEngine classes (particularly wrap() and 
unwrap()), as they are currently implemented, fail to fully attend to the 
handshake status from javax.net.ssl.SSLEngine. As a result these Geode classes 
fail to respond to the KeyUpdate message, resulting in the "Tag mismatch!" 
IOException.

When that exception is encountered, the Connection is destroyed and a new one 
created in its place. But users of the old Connection, waiting for 
acknowledgements, will never receive them. This can result in cluster-wide 
hangs.

[1] [https://datatracker.ietf.org/doc/html/rfc8446#section-5.5]

[2] 
[https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-B970ADD6-1E9F-4C18-A26E-0679B50CC946]
 

[3] [https://www.ibm.com/docs/en/sdk-java-technology/7.1?topic=sslengine-]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to