Hi,

Not sure if this would be useful but you can take a look.

https://www.google.com/url?sa=t&source=web&rct=j&url=https://github.com/fabric8io/kubernetes-client/issues/2168&ved=2ahUKEwiole7bvvjrAhXWFXcKHSzvBp8QFjAAegQIBxAB&usg=AOvVaw2-mLNJpzDlCCME-eHAXe1u

Thanks & Regards,
Hitesh Tiwari

On Sun, 20 Sep 2020, 10:48 mykidong, <mykid...@gmail.com> wrote:

>
>
> Hi,
>
> I have already succeeded submitting spark job to kubernetes with accessing
> s3 object storage in the below env.
> At that time, I was in the following env.:
> Spark: 3.0.0.
> S3 Object Storage: Hadoop Ozone S3 Object Storage accessed by HTTP Endpoint
> containging IP Address, Not Host Name.
> Resource Management: Kubernetes.
> My spark job worked OK.
>
>
> Now, I want to replace unsecured HTTP with HTTPS to access s3 object
> storage:
> Spark: 3.0.0.
> S3 Object Storage: MinIO S3 Object Storage accessed by HTTPS Endpoint
> containging Host Name.
> Resource Management: Kubernetes.
>
> I have already installed cert-manager, ingress controller on k8s, and added
> my S3 Endpoint Host Name to public DNS server.
> I have also tested MinIO S3 Object Storage using AWS CLI via HTTPS, it
> works
> fine what I expected.
>
> But, the problem is when I submit my spark job to kubernetes and the deps
> jar files will be uploaded to my MinIO S3 Object Storage, spark submit
> cannot find my S3 endpoint with the WRONG HOST NAME.
>
> Let's see my spark job submit:
>
> export MASTER=k8s://https://10.0.4.5:6443;
> export NAMESPACE=ai-developer;
> export ENDPOINT=https://mykidong-tenant.minio.cloudchef-labs.com;
>
> spark-submit \
> --master $MASTER \
> --deploy-mode cluster \
> --name spark-thrift-server \
> --class io.spongebob.hive.SparkThriftServerRunner \
> --packages
> com.amazonaws:aws-java-sdk-s3:1.11.375,org.apache.hadoop:hadoop-aws:3.2.0 \
> --conf "spark.executor.extraJavaOptions=-Dnetworkaddress.cache.ttl=60" \
> --conf "spark.driver.extraJavaOptions=-Dnetworkaddress.cache.ttl=60" \
> --conf spark.kubernetes.file.upload.path=s3a://mykidong/spark-thrift-server
> \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> --conf spark.kubernetes.namespace=$NAMESPACE \
> --conf spark.kubernetes.container.image=mykidong/spark:v3.0.0 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.hadoop.hive.metastore.client.connect.retry.delay=5 \
> --conf spark.hadoop.hive.metastore.client.socket.timeout=1800 \
> --conf
>
> spark.hadoop.hive.metastore.uris=thrift://metastore.$NAMESPACE.svc.cluster.local:9083
> \
> --conf spark.hadoop.hive.server2.enable.doAs=false \
> --conf spark.hadoop.hive.server2.thrift.http.port=10002 \
> --conf spark.hadoop.hive.server2.thrift.port=10016 \
> --conf spark.hadoop.hive.server2.transport.mode=binary \
> --conf spark.hadoop.metastore.catalog.default=spark \
> --conf spark.hadoop.hive.execution.engine=spark \
> --conf spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat \
> --conf spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat \
> --conf spark.sql.warehouse.dir=s3a://mykidong/apps/spark/warehouse \
> --conf spark.hadoop.fs.defaultFS=s3a://mykidong \
> --conf spark.hadoop.fs.s3a.access.key=bWluaW8= \
> --conf spark.hadoop.fs.s3a.secret.key=bWluaW8xMjM= \
> --conf spark.hadoop.fs.s3a.connection.ssl.enabled=true \
> --conf spark.hadoop.fs.s3a.endpoint=$ENDPOINT \
> --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
> --conf spark.hadoop.fs.s3a.fast.upload=true \
> --conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp"
> \
> --conf spark.executor.instances=4 \
> --conf spark.executor.memory=2G \
> --conf spark.executor.cores=2 \
> --conf spark.driver.memory=1G \
> --conf
>
> spark.jars=/home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar,/home/pcp/delta-lake/connectors/dist/hive-delta_2.12-0.1.0.jar
> \
>
> file:///home/pcp/spongebob/examples/spark-thrift-server/target/spark-thrift-server-1.0.0-SNAPSHOT-spark-job.jar;
>
>
>
> After a little time, I got the following UnknownHost Exception:
>
> 20/09/20 03:29:23 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 20/09/20 03:29:24 INFO SparkKubernetesClientFactory: Auto-configuring K8S
> client using current context from users K8S config file
> 20/09/20 03:29:25 INFO KerberosConfDriverFeatureStep: You have not
> specified
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the
> krb5.conf locally on the driver image.
> 20/09/20 03:29:26 WARN MetricsConfig: Cannot locate configuration: tried
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 20/09/20 03:29:26 INFO MetricsSystemImpl: Scheduled Metric snapshot period
> at 10 second(s).
> 20/09/20 03:29:26 INFO MetricsSystemImpl: s3a-file-system metrics system
> started
>
>
>
> Exception in thread "main" org.apache.spark.SparkException: Uploading file
>
> /home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar
> failed...
>         at
>
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:289)
>         at
>
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>         at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>         at
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at
>
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>         at
>
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:162)
>         at scala.collection.immutable.List.foreach(List.scala:392)
>         at
>
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:160)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>         at
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:89)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>         at
>
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
>         at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
>         at
>
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
>         at
> org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>         at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at
>
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>         at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist
> on
> mykidong: com.amazonaws.SdkClientException: Unable to execute HTTP request:
> mykidong.mykidong-tenant.minio.cloudchef-labs.com: Unable to execute HTTP
> request: mykidong.mykidong-tenant.minio.cloudchef-labs.com
>         at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:189)
>         at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
>         at
> org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
>         at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
>         at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
>         at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
>         at
>
> org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:375)
>         at
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:311)
>         at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>         at
> org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>         at
>
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:280)
>         ... 30 more
> Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP
> request:
> mykidong.mykidong-tenant.minio.cloudchef-labs.com
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>         at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>         at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
>         at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
>         at
>
> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
>         at
>
> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
>         at
>
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:376)
>         at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
>         ... 43 more
> Caused by: java.net.UnknownHostException:
> mykidong.mykidong-tenant.minio.cloudchef-labs.com
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>         at
>
> com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
>         at
>
> com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
>         at
>
> org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
>         at
>
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
>         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
>
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
>         at com.amazonaws.http.conn.$Proxy18.connect(Unknown Source)
>         at
>
> org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394)
>         at
>
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
>         at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>         at
>
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>         at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>         at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>         at
>
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1238)
>         at
>
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
>         ... 55 more
> 20/09/20 03:41:25 INFO ShutdownHookManager: Shutdown hook called
> 20/09/20 03:41:25 INFO ShutdownHookManager: Deleting directory
> /tmp/spark-35671300-f6a6-45c1-8f90-ca001d76eec6
>
>
> Take a look at the exception message.
> Even if my s3 endpoint is https://mykidong-tenant.minio.cloudchef-labs.com,
>
> the Unknown exception message has
> mykidong.mykidong-tenant.minio.cloudchef-labs.com
> That is, 'mykidong' prefix has been added to the original endpoint.
>
> Let's summarize it:
> - With HTTP Endpoint using IP Address, spark job works fine on Kubernetes.
> - But, with HTTPS Endpoint using Host Name(of course), spark submit cannot
> find S3 Endpoint.
>
>
> Any ideas?
>
> Cheers,
>
> - Kidong.
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to