Sessional Greetings ,
     We're doing tpc-ds query tests using Spark 3.0.2 on kubernetes with data 
on HDFS and we're observing delays in query execution time when compared to 
Spark 3.0.1 on same environment. We've observed that some stages fail, but 
looks like it is taking some time to realise this failure and re-trigger these 
stages.  I am attaching the configuration also which we used for the spark 
driver . We observe the same behaviour with sapark 3.0.3 also.
Please let us know if anyone has observed similar issues.

Configuration which we use for spark driver:
spark.io.compression.codec=snappy
spark.sql.parquet.filterPushdown=true

spark.sql.inMemoryColumnarStorage.batchSize=15000
spark.shuffle.file.buffer=1024k
spark.ui.retainedStages=10000
spark.kerberos.keytab=<keytab loacation>

spark.speculation=false
spark.submit.deployMode=cluster

spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true

spark.sql.orc.filterPushdown=true
spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.sql.crossJoin.enabled=true
spark.kubernetes.kerberos.keytab=<key-tab location>

spark.sql.adaptive.enabled=true
spark.kryo.unsafe=true
spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=<operator 
label>
spark.executor.cores=2
spark.ui.retainedTasks=200000
spark.network.timeout=2400


spark.rdd.compress=true
spark.executor.memoryoverhead=3G
spark.master=k8s\:<master ip>

spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=<label app name>
spark.kubernetes.driver.limit.cores=6144m
spark.kubernetes.submission.waitAppCompletion=false
spark.kerberos.principal=<principal>
spark.kubernetes.kerberos.enabled=true
spark.kubernetes.allocation.batch.size=5

spark.kubernetes.authenticate.driver.serviceAccountName=<serviceAccount name>

spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
spark.reducer.maxSizeInFlight=1024m

spark.storage.memoryFraction=0.25

spark.kubernetes.namespace=<namespace name>
spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=<executor label>
spark.rpc.numRetries=5

spark.shuffle.consolidateFiles=true
spark.sql.shuffle.partitions=400
spark.kubernetes.kerberos.krb5.path=/<file path>
spark.sql.codegen=true
spark.ui.strictTransportSecurity=max-age\=31557600
spark.ui.retainedJobs=10000

spark.driver.port=7078
spark.shuffle.io.backLog=256
spark.ssl.ui.enabled=true
spark.kubernetes.memoryOverheadFactor=0.1

spark.driver.blockManager.port=7079
spark.kubernetes.executor.limit.cores=4096m
spark.submit.pyFiles=
spark.kubernetes.container.image=<image name>
spark.shuffle.io.numConnectionsPerPeer=10

spark.sql.broadcastTimeout=7200

spark.driver.cores=3
spark.executor.memory=9g
spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=dfbd9c75-3771-4392-928e-10bf28d94099

spark.driver.maxResultSize=4g
spark.sql.parquet.mergeSchema=false

spark.sql.inMemoryColumnarStorage.compressed=true
spark.rpc.retry.wait=5
spark.hadoop.parquet.enable.summary-metadata=false


spark.kubernetes.allocation.batch.delay=9
spark.driver.memory=16g
spark.sql.starJoinOptimization=true
spark.kubernetes.submitInDriver=true
spark.shuffle.compress=true
spark.memory.useLegacyMode=true
spark.jars=
spark.kubernetes.resource.type=java
spark.locality.wait=0s
spark.kubernetes.driver.ui.svc.port=4040
spark.sql.orc.splits.include.file.footer=true
spark.kubernetes.kerberos.principal=<principle>

spark.sql.orc.cache.stripe.details.size=10000

spark.executor.instances=22
spark.hadoop.fs.hdfs.impl.disable.cache=true
spark.sql.hive.metastorePartitionPruning=true

Thanks and Regards
Prakash

Reply via email to