Hi Chesnay, Thanks for the reply. After your suggestion, I found out that setting *blob.service.ssl.enabled: false* solved the issue and now all the pipelines run as expected. So, the issue is kinda narrowed down to blob service ssl now. I also checked the jobmanager logs when blob ssl is enabled and I see the following error:
*2017-10-03 23:28:50.459 [BLOB connection for /<jm_ip>:46932] ERROR org.apache.flink.runtime.blob.BlobServerConnection - Error while executing BLOB connection. javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2023) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:928) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at sun.security.ssl.AppInputStream.read(AppInputStream.java:71) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:119) * So, is there some additional steps that I have to follow for enabling SSL for blob service? On Wed, Oct 4, 2017 at 4:09 PM, Eron Wright <eronwri...@gmail.com> wrote: > By following Chesney's recommendation we will hopefully uncover an SSL > error that is being masked. Another thing to try is to disable hostname > verification (it is enabled by default) to see whether the certificate is > being rejected. > > On Wed, Oct 4, 2017 at 5:15 AM, Chesnay Schepler <ches...@apache.org> > wrote: > >> something that would also help us narrow down the problematic area is to >> enable SSL for one component at a time and see >> which one causesd the job to fail. >> >> >> On 04.10.2017 14:11, Chesnay Schepler wrote: >> >> The configuration looks reasonable. Just to be sure, are the paths >> accessible by all nodes? >> >> As a first step, could you set the logging level to DEBUG (by modifying >> the 'conf/log4j.properties' file), resubmit the job (after a cluster >> restart) and check the Job- and TaskManager logs for any exception? >> >> On 04.10.2017 03:15, Aniket Deshpande wrote: >> >> Background: We have a setup of Flink 1.3.1 along with a secure MAPR >> cluster (Flink is running on mapr client nodes). We run this flink cluster >> via flink-jobmanager.sh foreground and flink-taskmanager.sh foreground >> command >> via Marathon. In order for us to make this work, we had to add >> -Djavax.net.ssl.trustStore="$JAVA_HOME/jre/lib/security/cacerts" in >> flink-console.sh as extra JVM arg (otherwise, flink was taking MAPR's >> ssl_truststore as default truststore and then we were facing issues for any >> 3rd party jars like aws_sdk etc.). This entire setup was working fine as it >> is and we could submit our jars and the pipelines ran without any problem >> >> >> Problem: We started experimenting with enabling ssl for all >> communication for Flink. For this, we followed https://ci.apache.org >> /projects/flink/flink-docs-release-1.3/setup/security-ssl.html for >> generating CA and keystore. I added the following properties to >> flink-conf.yaml: >> >> >> security.ssl.enabled: true >> security.ssl.keystore: /opt/flink/certs/node1.keystore >> security.ssl.keystore-password: <password> >> security.ssl.key-password: <password> >> security.ssl.truststore: /opt/flink/certs/ca.truststore >> security.ssl.truststore-password: <password> >> jobmanager.web.ssl.enabled: true >> taskmanager.data.ssl.enabled: true >> blob.service.ssl.enabled: true >> akka.ssl.enabled: true >> >> >> We then spin up a cluster and tried submitting the same job which was >> working before. We get the following erros: >> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot >> load user class: org.apache.flink.streaming.con >> nectors.kafka.FlinkKafkaConsumer09 >> ClassLoader info: URL ClassLoader: >> Class not resolvable through given classloader. >> at org.apache.flink.streaming.api.graph.StreamConfig.getStreamO >> perator(StreamConfig.java:229) >> at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init >> >(OperatorChain.java:95) >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> StreamTask.java:230) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> at java.lang.Thread.run(Thread.java:748) >> >> >> This error disappears when we remove the ssl config properties i.e run >> flink cluster without ssl enabled. >> >> >> So, did we miss any steps for enabling ssl? >> >> >> P.S.: We tried removing the extra JVm arg mentioned above, but still get >> the same error. >> >> -- >> >> Aniket >> >> >> >> > -- Yours Sincerely, Aniket S Deshpande.