Hello again,

I am working on getting the full SSL logs over to you, but I have one more
question in between: TLS v1.3 is enabled by default in JDK 11, and my app
is using TLS v1.2 specifically. There's a known issue that's recently
addressed by the JDK related to TLS v1.3 half close policy, details here:
https://bugs.java.com/view_bug.do?bug_id=8207009

Would you be able to confirm whether your SSL test runs successfully when
the connecting client/server use TLS v.12 specifically ?

FYI, I have tried specifically disabling TLS v1.3 using both the
"jdk.tls.client.protocols" and "jdk.tls.server.protocols" system
properties, and also set "jdk.tls.acknowledgeCloseNotify" to true on both
sides as indicated here - https://bugs.java.com/view_bug.do?bug_id=8208526

Based on my explorations so far, I think this may be a JDK issue
(specifically in the JSSE provider) that has not been addressed yet. Either
way, do you think three is anything that could be done in Ignite to
explicitly close the connection on both sides in a scenario like this ?

What I can safely share on the SSL logs so far is this (both nodes get
stuck, node 1 in failing to close the SSL connection, node 2 in waiting for
partition exchange)

Node 1:

"2018-10-23 14:18:40.981 PDT|SSLEngineImpl.java:715|Closing inbound of
SSLEngine
javax.net.ssl|ERROR|3F|grid-nio-worker-tcp-comm-1-#26%%|2018-10-23
14:18:40.981 PDT|TransportContext.java:313|Fatal (INTERNAL_ERROR): closing
inbound before receiving peer's close_notify (
"throwable" : {
  javax.net.ssl.SSLException: closing inbound before receiving peer's
close_notify
  at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:129)
  at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:308)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:255)
  at
java.base/sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:724)
  at
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.shutdown(GridNioSslHandler.java:185)
  at
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.onSessionClosed(GridNioSslFilter.java:223)
  at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95)
  at
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3447)
  at
org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:149)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2689)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2002)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
  at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at java.base/java.lang.Thread.run(Thread.java:834)}

)
2018-10-23 14:18:40.981 PDT|SSLSessionImpl.java:753|Invalidated session:
Session(1540329505125|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
DEBUG 14:18:40,981-0700 [] communication.tcp.TcpCommunicationSpi: Unable to
correctly close inbound data stream (will ignore) "


Server 2

"javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLSocketOutputRecord.java:310|WRITE: TLS12 application_data, length = 1
javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLCipher.java:1727|Plaintext before ENCRYPTION (
  0000: 01                                                 .
)
javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLSocketOutputRecord.java:324|Raw write (
  0000: 17 03 03 00 19 00 00 00   00 00 00 00 A1 45 E5 C8  .............E..
  0010: 27 3D EE E2 89 4A 0F DD   B6 9A 2A A8 5F 6B        '=...J....*._k
)
WARN 14:18:10,585-0700 [*Initialization*]
processors.cache.GridCachePartitionExchangeManager: Still waiting for
initial partition map exchange "

On Mon, Oct 22, 2018 at 12:31 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
wrote:

> Hello!
>
> I would suggest regular (or DEBUG) Ignite logs + SSL debug logs.
>
>
> https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 19 окт. 2018 г. в 20:58, Loredana Radulescu Ivanoff <
> lradu...@tibco.com>:
>
>> Definitely - do you want debug logs, and if yes, should I enable them on
>> all Ignite packages, or just for certain ones? Any other diagnostic flags
>> that I should set?
>>
>> On Fri, Oct 19, 2018 at 7:48 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Apache Ignite does not claim support JDK 11 at this moment. However, I
>>> was able to run SSL test (TcpDiscoverySslTrustedUntrustedTest) and it
>>> turned out mostly fine.
>>>
>>> More info is needed from your side, such as full instances logs.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 19 окт. 2018 г. в 0:56, Loredana Radulescu Ivanoff <
>>> lradu...@tibco.com>:
>>>
>>>> Hello,
>>>>
>>>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
>>>> SSL enabled:
>>>>
>>>>
>>>>    - the second node that I bring up joins, and then shortly after
>>>>    freezes and prints this message every minute:
>>>>
>>>> "WARN ...[*Initialization*]
>>>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>>>> initial partition map exchange"
>>>>
>>>>
>>>>    - once the second node joins, the first node starts experiencing
>>>>    very frequent 100% CPU spikes; these are the messages I see:
>>>>
>>>> WARN 2018-10-18T13:50:52,728-0700 []
>>>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>>>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>>>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>>>> WARN 2018-10-18T13:50:52,737-0700 []
>>>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>>>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>>>> engine status after closeOutbound call [status=OK,
>>>> handshakeStatus=NEED_WRAP,
>>>> WARN 2018-10-18T13:51:01,441-0700 []
>>>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>>>> release latch within timeout: ServerLatch [permits=1,
>>>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>>>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>>>
>>>> Other observations:
>>>>
>>>> I can reproduce this every time I start the nodes, and it doesn't
>>>> matter which node comes up first.
>>>>
>>>>
>>>> The issue goes away if I disable SSL.
>>>>
>>>>
>>>> Increasing the socketWriteTimeout, networkTimeout or the
>>>> failureDetectionTimeout does not help.
>>>>
>>>> It seems to be happening only with JDK 11, and not with JDK 8.
>>>>
>>>>
>>>> Do you have any suggestions/known issues about this?
>>>>
>>>> Thank you,
>>>>
>>>> Loredana
>>>>
>>>>
>>>>
>>>>
>>>>

Reply via email to