Hi, Not sure if this would be useful but you can take a look.
https://www.google.com/url?sa=t&source=web&rct=j&url=https://github.com/fabric8io/kubernetes-client/issues/2168&ved=2ahUKEwiole7bvvjrAhXWFXcKHSzvBp8QFjAAegQIBxAB&usg=AOvVaw2-mLNJpzDlCCME-eHAXe1u Thanks & Regards, Hitesh Tiwari On Sun, 20 Sep 2020, 10:48 mykidong, <mykid...@gmail.com> wrote: > > > Hi, > > I have already succeeded submitting spark job to kubernetes with accessing > s3 object storage in the below env. > At that time, I was in the following env.: > Spark: 3.0.0. > S3 Object Storage: Hadoop Ozone S3 Object Storage accessed by HTTP Endpoint > containging IP Address, Not Host Name. > Resource Management: Kubernetes. > My spark job worked OK. > > > Now, I want to replace unsecured HTTP with HTTPS to access s3 object > storage: > Spark: 3.0.0. > S3 Object Storage: MinIO S3 Object Storage accessed by HTTPS Endpoint > containging Host Name. > Resource Management: Kubernetes. > > I have already installed cert-manager, ingress controller on k8s, and added > my S3 Endpoint Host Name to public DNS server. > I have also tested MinIO S3 Object Storage using AWS CLI via HTTPS, it > works > fine what I expected. > > But, the problem is when I submit my spark job to kubernetes and the deps > jar files will be uploaded to my MinIO S3 Object Storage, spark submit > cannot find my S3 endpoint with the WRONG HOST NAME. > > Let's see my spark job submit: > > export MASTER=k8s://https://10.0.4.5:6443; > export NAMESPACE=ai-developer; > export ENDPOINT=https://mykidong-tenant.minio.cloudchef-labs.com; > > spark-submit \ > --master $MASTER \ > --deploy-mode cluster \ > --name spark-thrift-server \ > --class io.spongebob.hive.SparkThriftServerRunner \ > --packages > com.amazonaws:aws-java-sdk-s3:1.11.375,org.apache.hadoop:hadoop-aws:3.2.0 \ > --conf "spark.executor.extraJavaOptions=-Dnetworkaddress.cache.ttl=60" \ > --conf "spark.driver.extraJavaOptions=-Dnetworkaddress.cache.ttl=60" \ > --conf spark.kubernetes.file.upload.path=s3a://mykidong/spark-thrift-server > \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > --conf spark.kubernetes.namespace=$NAMESPACE \ > --conf spark.kubernetes.container.image=mykidong/spark:v3.0.0 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.hadoop.hive.metastore.client.connect.retry.delay=5 \ > --conf spark.hadoop.hive.metastore.client.socket.timeout=1800 \ > --conf > > spark.hadoop.hive.metastore.uris=thrift://metastore.$NAMESPACE.svc.cluster.local:9083 > \ > --conf spark.hadoop.hive.server2.enable.doAs=false \ > --conf spark.hadoop.hive.server2.thrift.http.port=10002 \ > --conf spark.hadoop.hive.server2.thrift.port=10016 \ > --conf spark.hadoop.hive.server2.transport.mode=binary \ > --conf spark.hadoop.metastore.catalog.default=spark \ > --conf spark.hadoop.hive.execution.engine=spark \ > --conf spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat \ > --conf spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat \ > --conf spark.sql.warehouse.dir=s3a://mykidong/apps/spark/warehouse \ > --conf spark.hadoop.fs.defaultFS=s3a://mykidong \ > --conf spark.hadoop.fs.s3a.access.key=bWluaW8= \ > --conf spark.hadoop.fs.s3a.secret.key=bWluaW8xMjM= \ > --conf spark.hadoop.fs.s3a.connection.ssl.enabled=true \ > --conf spark.hadoop.fs.s3a.endpoint=$ENDPOINT \ > --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ > --conf spark.hadoop.fs.s3a.fast.upload=true \ > --conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" > \ > --conf spark.executor.instances=4 \ > --conf spark.executor.memory=2G \ > --conf spark.executor.cores=2 \ > --conf spark.driver.memory=1G \ > --conf > > spark.jars=/home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar,/home/pcp/delta-lake/connectors/dist/hive-delta_2.12-0.1.0.jar > \ > > file:///home/pcp/spongebob/examples/spark-thrift-server/target/spark-thrift-server-1.0.0-SNAPSHOT-spark-job.jar; > > > > After a little time, I got the following UnknownHost Exception: > > 20/09/20 03:29:23 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/20 03:29:24 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 20/09/20 03:29:25 INFO KerberosConfDriverFeatureStep: You have not > specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 20/09/20 03:29:26 WARN MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties > 20/09/20 03:29:26 INFO MetricsSystemImpl: Scheduled Metric snapshot period > at 10 second(s). > 20/09/20 03:29:26 INFO MetricsSystemImpl: s3a-file-system metrics system > started > > > > Exception in thread "main" org.apache.spark.SparkException: Uploading file > > /home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar > failed... > at > > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:289) > at > > org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247) > at > > org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:162) > at scala.collection.immutable.List.foreach(List.scala:392) > at > > org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:160) > at > > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:89) > at > > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) > at > > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98) > at > > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221) > at > > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539) > at > > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215) > at > > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188) > at > org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist > on > mykidong: com.amazonaws.SdkClientException: Unable to execute HTTP request: > mykidong.mykidong-tenant.minio.cloudchef-labs.com: Unable to execute HTTP > request: mykidong.mykidong-tenant.minio.cloudchef-labs.com > at > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:189) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) > at > org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236) > at > > org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:375) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:311) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at > org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at > > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:280) > ... 30 more > Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP > request: > mykidong.mykidong-tenant.minio.cloudchef-labs.com > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315) > at > > com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344) > at > > com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284) > at > > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:376) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) > ... 43 more > Caused by: java.net.UnknownHostException: > mykidong.mykidong-tenant.minio.cloudchef-labs.com > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at > > com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) > at > > com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) > at > > org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) > at > > org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373) > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > > com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) > at com.amazonaws.http.conn.$Proxy18.connect(Unknown Source) > at > > org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394) > at > > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > at > > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at > > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > at > > com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1238) > at > > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058) > ... 55 more > 20/09/20 03:41:25 INFO ShutdownHookManager: Shutdown hook called > 20/09/20 03:41:25 INFO ShutdownHookManager: Deleting directory > /tmp/spark-35671300-f6a6-45c1-8f90-ca001d76eec6 > > > Take a look at the exception message. > Even if my s3 endpoint is https://mykidong-tenant.minio.cloudchef-labs.com, > > the Unknown exception message has > mykidong.mykidong-tenant.minio.cloudchef-labs.com > That is, 'mykidong' prefix has been added to the original endpoint. > > Let's summarize it: > - With HTTP Endpoint using IP Address, spark job works fine on Kubernetes. > - But, with HTTPS Endpoint using Host Name(of course), spark submit cannot > find S3 Endpoint. > > > Any ideas? > > Cheers, > > - Kidong. > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >