Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
Don't we have any property for it? One more quick question that if files created by Spark is less than HDFS block size then the rest of Block space will become unavailable and remain unutilized or it will be shared with other files? On Mon, Jan 21, 2019 at 1:30 PM Shivam Sharma <28shivamsha...@gm

userClassPath first fails

2019-01-21 Thread Moein Hosseini
Hi everyone, I've a cluster of Standalone spark 2.4.0 (without-hadoop version) which both of *spark.executor.userClassPathFirst* and *spark.driver.userClassPathFirst* set true. This cluster run on HDP (v3.1.0) and set SPARK_DIST_CLASSPATH to $(hadoop classpath), My application fails to run because

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Arnaud LARROQUE
Hi Shivam, At the end, the file is taking its own space regardless of the block size. So if you're file is just a few ko bytes, it will take only this few ko bytes. But I've noticed that when the file is written, somehow a block is allocated and the Namenode consider that all the block size is use

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
Thanks Arnaud On Mon, Jan 21, 2019 at 2:07 PM Arnaud LARROQUE wrote: > Hi Shivam, > > At the end, the file is taking its own space regardless of the block size. > So if you're file is just a few ko bytes, it will take only this few ko > bytes. > But I've noticed that when the file is written, so

How to Overwrite a saved PySpark ML Model

2019-01-21 Thread Aakash Basu
Hi, I am trying to overwrite a Spark ML Logistic Regression Model, but it isn't working. Tried: a) lr_model.write.overwrite().save(input_dict["config"]["save_model_path"]) and b) lr_model.write.overwrite.save(input_dict["config"]["save_model_path"]) This works (if I do not want to overwrite): lr

Re: How to Overwrite a saved PySpark ML Model

2019-01-21 Thread Aakash Basu
Hey all, The message seems to be a Java error message, and not a Python one. So, now I tried by calling the writemethod first: lr_model.write().overwrite().save(input_dict["config"]["save_model_path"]) It is still running, shall update if it works, otherwise shall need your help. Thanks, Aakash

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread attilapiros
Hello, I was working on this area last year (I have developed the YarnAllocatorBlacklistTracker) and if you haven't found any solution for your problem I can introduce a new config which would contain a sequence of always blacklisted nodes. This way blacklisting would improve a bit again :) -- S

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-21 Thread Serega Sheypak
Hi Apiros, thanks for your reply. Is it this one: https://github.com/apache/spark/pull/23223 ? Can I try to reach you through Cloudera Support portal? пн, 21 янв. 2019 г. в 20:06, attilapiros : > Hello, I was working on this area last year (I have developed the > YarnAllocatorBlacklistTracker) a