Thanks for reporting this. Given that hostname verification seems to be the issue, I would assume that the TaskManager somehow advertises a hostname in a form that is incompatile with the verification in some setups.
While it would be interesting to dig deeper into why this happens, I think we need to move away from hostname verification for internal communication (rpc, TaskManager Netty, blob server) anyways for the following reasons: - Hostname verification is hard (or pretty much incompatible) between containers in many container environments - The verification is mainly useful if you use a certificate in a certification chain with some other trusted root certificates - For internal SSL between JM/TM and TM/TM, the recommended method is to generate a single purpose certificate (may be self signed) and add a key store and trust store with only that certificate. Given such a "single certificate truststore", hostname verification does not add any additional security (to my understanding). For Flink 1.6, we are also adding transparent mutual authentication for internal communication (RPC; blob server, netty data plane), which should be an additional level of security. If this is uses with dedicated (self signed) certificates, it should be very secure and not rely on hostname verification. That said, for external communication (REST calls against JM/Dispatcher/...) clients should use hostname verification, because many users use certificates in a certificate chain for these external endpoints. Best, Stephan On Thu, Jul 12, 2018 at 11:02 PM, PACE, JAMES <jp4...@att.com> wrote: > I have the following SSL configuration for a 3 node HA flink cluster: > > > > #taskmanager.data.ssl.enabled: false > > security.ssl.enabled: true > > security.ssl.keystore: /opt/app/certificates/server-keystore.jks > > security.ssl.keystore-password: <redacted> > > security.ssl.key-password: <redacted> > > security.ssl.truststore: /opt/app/certificates/cacerts > > security.ssl.truststore-password: <redacted> > > security.ssl.verify-hostname: true > > > > The job we’re running is the sample WordCount.jar. The running version of > flink is 1.4.0. It’s not the latest, but I didn’t see anything that looked > like updating would solve this issue. > > > > If either security.ssl.verify-hostname is set to false or > taskmanager.data.ssl.enabled is set to false, everything works fine. > > > > When flink is run in the above configuration above, with ssl fully enabled > and security.ssl.verify-hostname: true, the flink job fails. However, when > going through the logs, SSL appears fine for akka, blob service, and > jobmanager. > > > > The root cause looks to be Caused by: java.security.cert.CertificateException: > No subject alternative names matching IP address xxx.xxx.xxx.xxx found. > > I have tried setting taskmanager.hostname to the FQDN of the host, but > that did not change anything. > > We don’t generate certificates with SAN fields. > > > > Any thoughts would be appreciated. > > > > This is the full stack trace > > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' > terminated due to an exception: Sending the partition request failed. > > at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ > ThreadBase.run(UnilateralSortMerger.java:800) > > Caused by: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request failed. > > at org.apache.flink.runtime.io.network.netty. > PartitionRequestClient$1.operationComplete(PartitionRequestClient.java: > 119) > > at org.apache.flink.runtime.io.network.netty. > PartitionRequestClient$1.operationComplete(PartitionRequestClient.java: > 111) > > at org.apache.flink.shaded.netty4.io.netty.util. > concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > > at org.apache.flink.shaded.netty4.io.netty.util. > concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > > at org.apache.flink.shaded.netty4.io.netty.util. > concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > > at org.apache.flink.shaded.netty4.io.netty.channel. > PendingWriteQueue.safeFail(PendingWriteQueue.java:252) > > at org.apache.flink.shaded.netty4.io.netty.channel. > PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler. > setHandshakeFailure(SslHandler.java:1256) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.unwrap(SslHandler.java:1040) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.decode(SslHandler.java:934) > > at org.apache.flink.shaded.netty4.io.netty.handler.codec. > ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315) > > at org.apache.flink.shaded.netty4.io.netty.handler.codec. > ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229) > > at org.apache.flink.shaded.netty4.io.netty.channel. > AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:339) > > at org.apache.flink.shaded.netty4.io.netty.channel. > AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:324) > > at org.apache.flink.shaded.netty4.io.netty.channel. > DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > NioEventLoop.processSelectedKey(NioEventLoop.java:511) > > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > NioEventLoop.run(NioEventLoop.java:354) > > at org.apache.flink.shaded.netty4.io.netty.util.concurrent. > SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem > > at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431) > > at sun.security.ssl.SSLEngineImpl.checkTaskThrown( > SSLEngineImpl.java:535) > > at sun.security.ssl.SSLEngineImpl.readNetRecord( > SSLEngineImpl.java:813) > > at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) > > at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.unwrap(SslHandler.java:1114) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.unwrap(SslHandler.java:981) > > ... 13 more > > Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem > > at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) > > at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) > > at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304) > > at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) > > at sun.security.ssl.ClientHandshaker.serverCertificate( > ClientHandshaker.java:1509) > > at sun.security.ssl.ClientHandshaker.processMessage( > ClientHandshaker.java:216) > > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979) > > at sun.security.ssl.Handshaker$1.run(Handshaker.java:919) > > at sun.security.ssl.Handshaker$1.run(Handshaker.java:916) > > at java.security.AccessController.doPrivileged(Native Method) > > at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker. > java:1369) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.runDelegatedTasks(SslHandler.java:1148) > > at org.apache.flink.shaded.netty4.io.netty.handler.ssl. > SslHandler.unwrap(SslHandler.java:1003) > > ... 13 more > > Caused by: java.security.cert.CertificateException: No subject > alternative names matching IP address xxx.xxx.xxx.xxx found > > at sun.security.util.HostnameChecker.matchIP( > HostnameChecker.java:167) > > at sun.security.util.HostnameChecker.match( > HostnameChecker.java:93) > > at sun.security.ssl.X509TrustManagerImpl.checkIdentity( > X509TrustManagerImpl.java:455) > > at sun.security.ssl.X509TrustManagerImpl.checkIdentity( > X509TrustManagerImpl.java:436) > > at sun.security.ssl.X509TrustManagerImpl.checkTrusted( > X509TrustManagerImpl.java:252) > > at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted( > X509TrustManagerImpl.java:136) > > at sun.security.ssl.ClientHandshaker.serverCertificate( > ClientHandshaker.java:1496) > > ... 21 more >