Thanks for reporting this.

Given that hostname verification seems to be the issue, I would assume that
the TaskManager somehow advertises a hostname in a form that is incompatile
with the verification in some setups.

While it would be interesting to dig deeper into why this happens, I think
we need to move away from hostname verification for internal communication
(rpc, TaskManager Netty, blob server) anyways for the following reasons:

  - Hostname verification is hard (or pretty much incompatible) between
containers in many container environments
  - The verification is mainly useful if you use a certificate in a
certification chain with some other trusted root certificates
  - For internal SSL between JM/TM and TM/TM, the recommended method is to
generate a single purpose certificate (may be self signed) and add a key
store and trust store with only that certificate. Given such a "single
certificate truststore", hostname verification does not add any additional
security (to my understanding).

For Flink 1.6, we are also adding transparent mutual authentication for
internal communication (RPC; blob server, netty data plane), which should
be an additional level of security. If this is uses with dedicated (self
signed) certificates, it should be very secure and not rely on hostname
verification.

That said, for external communication (REST calls against
JM/Dispatcher/...) clients should use hostname verification, because many
users use certificates in a certificate chain for these external endpoints.

Best,
Stephan



On Thu, Jul 12, 2018 at 11:02 PM, PACE, JAMES <jp4...@att.com> wrote:

> I have the following SSL configuration for a 3 node HA flink cluster:
>
>
>
> #taskmanager.data.ssl.enabled: false
>
> security.ssl.enabled: true
>
> security.ssl.keystore: /opt/app/certificates/server-keystore.jks
>
> security.ssl.keystore-password: <redacted>
>
> security.ssl.key-password: <redacted>
>
> security.ssl.truststore: /opt/app/certificates/cacerts
>
> security.ssl.truststore-password: <redacted>
>
> security.ssl.verify-hostname: true
>
>
>
> The job we’re running is the sample WordCount.jar.  The running version of
> flink is 1.4.0.  It’s not the latest, but I didn’t see anything that looked
> like updating would solve this issue.
>
>
>
> If either security.ssl.verify-hostname is set to false or
> taskmanager.data.ssl.enabled is set to false, everything works fine.
>
>
>
> When flink is run in the above configuration above, with ssl fully enabled
> and security.ssl.verify-hostname: true, the flink job fails.  However, when
> going through the logs, SSL appears fine for akka, blob service, and
> jobmanager.
>
>
>
> The root cause looks to be Caused by: java.security.cert.CertificateException:
> No subject alternative names matching IP address xxx.xxx.xxx.xxx found.
>
> I have tried setting taskmanager.hostname to the FQDN of the host, but
> that did not change anything.
>
> We don’t generate certificates with SAN fields.
>
>
>
> Any thoughts would be appreciated.
>
>
>
> This is the full stack trace
>
> Caused by: java.io.IOException: Thread 'SortMerger Reading Thread'
> terminated due to an exception: Sending the partition request failed.
>
>         at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$
> ThreadBase.run(UnilateralSortMerger.java:800)
>
> Caused by: 
> org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
> Sending the partition request failed.
>
>         at org.apache.flink.runtime.io.network.netty.
> PartitionRequestClient$1.operationComplete(PartitionRequestClient.java:
> 119)
>
>         at org.apache.flink.runtime.io.network.netty.
> PartitionRequestClient$1.operationComplete(PartitionRequestClient.java:
> 111)
>
>         at org.apache.flink.shaded.netty4.io.netty.util.
> concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>
>         at org.apache.flink.shaded.netty4.io.netty.util.
> concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>
>         at org.apache.flink.shaded.netty4.io.netty.util.
> concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> PendingWriteQueue.safeFail(PendingWriteQueue.java:252)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.
> setHandshakeFailure(SslHandler.java:1256)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.unwrap(SslHandler.java:1040)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.decode(SslHandler.java:934)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.codec.
> ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.codec.
> ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> AbstractChannelHandlerContext.invokeChannelRead(
> AbstractChannelHandlerContext.java:339)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:324)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> NioEventLoop.run(NioEventLoop.java:354)
>
>         at org.apache.flink.shaded.netty4.io.netty.util.concurrent.
> SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
>
>         at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
>
>         at sun.security.ssl.SSLEngineImpl.checkTaskThrown(
> SSLEngineImpl.java:535)
>
>         at sun.security.ssl.SSLEngineImpl.readNetRecord(
> SSLEngineImpl.java:813)
>
>         at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
>
>         at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.unwrap(SslHandler.java:1114)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.unwrap(SslHandler.java:981)
>
>         ... 13 more
>
> Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
>
>         at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>
>         at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
>
>         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
>
>         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
>
>         at sun.security.ssl.ClientHandshaker.serverCertificate(
> ClientHandshaker.java:1509)
>
>         at sun.security.ssl.ClientHandshaker.processMessage(
> ClientHandshaker.java:216)
>
>         at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
>
>         at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
>
>         at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.
> java:1369)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.runDelegatedTasks(SslHandler.java:1148)
>
>         at org.apache.flink.shaded.netty4.io.netty.handler.ssl.
> SslHandler.unwrap(SslHandler.java:1003)
>
>         ... 13 more
>
> Caused by: java.security.cert.CertificateException: No subject
> alternative names matching IP address xxx.xxx.xxx.xxx found
>
>         at sun.security.util.HostnameChecker.matchIP(
> HostnameChecker.java:167)
>
>         at sun.security.util.HostnameChecker.match(
> HostnameChecker.java:93)
>
>         at sun.security.ssl.X509TrustManagerImpl.checkIdentity(
> X509TrustManagerImpl.java:455)
>
>         at sun.security.ssl.X509TrustManagerImpl.checkIdentity(
> X509TrustManagerImpl.java:436)
>
>         at sun.security.ssl.X509TrustManagerImpl.checkTrusted(
> X509TrustManagerImpl.java:252)
>
>         at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(
> X509TrustManagerImpl.java:136)
>
>         at sun.security.ssl.ClientHandshaker.serverCertificate(
> ClientHandshaker.java:1496)
>
>         ... 21 more
>

Reply via email to