Difference between 'cores' config params: spark submit on k8s

2019-03-07 Thread Battini Lakshman
Hello, I understand we need to specify the 'spark.kubernetes.driver.limit.cores' and 'spark.kubernetes.executor.limit.cores' config parameters while submitting spark on k8s namespace with resource quota applied. There are also other config parameters 'spark.driver.cores' and 'spark.executor.cores

Re: spark structured streaming crash due to decompressing gzip file failure

2019-03-07 Thread Lian Jiang
Thanks, it worked. On Thu, Mar 7, 2019 at 5:05 AM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi, > > In your spark-submit command, try using the below config property and see > if this solves the problem. > > --conf spark.sql.files.ignoreCorruptFiles=true > > For me this worked to

Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0

2019-03-07 Thread Akshay Mendole
Hi, No. It's a java application that uses RDD APIs. Thanks, Akshay On Mon, Feb 25, 2019 at 7:54 AM Manu Zhang wrote: > Is your application using Spark SQL / DataFrame API ? Is so, please try > setting > > spark.sql.files.maxPartitionBytes > > to a larger value which is 128MB by default. >

Re: spark structured streaming crash due to decompressing gzip file failure

2019-03-07 Thread Akshay Bhardwaj
Hi, In your spark-submit command, try using the below config property and see if this solves the problem. --conf spark.sql.files.ignoreCorruptFiles=true For me this worked to ignore reading empty/partially uploaded gzip files in s3 bucket. Akshay Bhardwaj +91-97111-33849 On Thu, Mar 7, 2019 a

[SparkSQL, user-defined Hadoop, K8s] Hadoop free spark on kubernetes => NoClassDefFound

2019-03-07 Thread Sommer Tobias
Hi all, we are having problems with using a custom hadoop lib in a spark image when running it on a kubernetes cluster while following the steps of the documentation. Details in the description below. Does anyone else had similar problems? Is there something missing in the setup below? Or

Hadoop free spark on kubernetes => NoClassDefFound

2019-03-07 Thread Sommer Tobias
Hi, we are having problems with using a custom hadoop lib in a spark image when running it on a kubernetes cluster while following the steps of the documentation. Details in the description below. Does anyone else had similar problems? Is there something missing in the setup below? Or is t

Re: spark df.write.partitionBy run very slow

2019-03-07 Thread JF Chen
Yes, I agree. >From the spark UI I can ensure data is not skewed. There is only about 100MB for each task, where most of tasks takes several seconds to write the data to hdfs, and some tasks takes minutes of time. Regard, Junfeng Chen On Wed, Mar 6, 2019 at 2:39 PM Shyam P wrote: > Hi JF, > Y