Heap Settings for History Server

2017-07-31 Thread N Sa
Hi folks, I couldn't find much literature on this so I figured I could ask here. Does anyone have experience in tuning the memory settings and interval times of the Spark History Server? Let's say I have 500 applications at 0.5 G each with a *spark.history.fs.update.interval* of 400s. Is there

RE: SPARK Issue in Standalone cluster

2017-07-31 Thread Mahesh Sawaiker
Gourav, Riccardo’s answer is spot on. What is happening is one node of spark is writing to its own directory and telling a slave to read the data from there, when the slave goes to read it, the part is not found. Check the folder

Re: ClassNotFoundException for Workers

2017-07-31 Thread Noppanit Charassinvichai
I've included that in my build file for the fat jar already. libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155" libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155" libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155" Not sure if I need

Re: SPARK Issue in Standalone cluster

2017-07-31 Thread Gourav Sengupta
Hi Riccardo, I am grateful for your kind response. Also I am sure that your answer is completely wrong and errorneous. SPARK must be having a method so that different executors do not pick up the same files to process. You also did not answer the question why was the processing successful in

Re: ALSModel.load not working on pyspark 2.1.0

2017-07-31 Thread Cristian Garcia
Thanks Irving, The problem was that I was using spark in cluster mode and had to resort to HDFS to properly save/load the model. On Mon, Jul 31, 2017 at 9:09 AM Irving Duran wrote: > I think the problem is because you are calling "model2 = >

Re: ALSModel.load not working on pyspark 2.1.0

2017-07-31 Thread Irving Duran
I think the problem is because you are calling "model2 = ALSModel.load("/models/als")" instead of "model2 = *model*.load("/models/als")". See my working sample below. >>> model.save('/models/als.test') SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to

transactional data in sparksql

2017-07-31 Thread luohui20001
hello guys: I have some transactional data as attached file 1.txt. A sequence of a single operation 1 followed by a few operations 0 is a transation here. The transcations, which sum(Amount) of operation 0 is less than the sum(Amount) of operation 1, need to be found out. There are

Re: can I do spark-submit --jars [s3://bucket/folder/jar_file]? or --jars

2017-07-31 Thread 周康
When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included on the driver and executor classpaths. Directory expansion does not work

Re: Spark parquet file read problem !

2017-07-31 Thread serkan taş
Thank you very much. Schema merge fixed the structure problem but the fields with same name but different type still is an issue i should work on. Android için Outlook uygulamasını edinin Kimden: 萝卜丝炒饭 Gönderildi: 31 Temmuz Pazartesi 11:16 Konu: Re: Spark parquet file

Re: SPARK Issue in Standalone cluster

2017-07-31 Thread Riccardo Ferrari
Hi Gourav, The issue here is the location where you're trying to write/read from : /Users/gouravsengupta/Development/spark/sparkdata/test1/p... When dealing with clusters all the paths and resources should be available to all executors (and driver), and that is reason why you generally use HDFS,

Re: Spark parquet file read problem !

2017-07-31 Thread ??????????
please add the schemaMerge to the option. ---Original--- From: "serkan ta?0?6" Date: 2017/7/31 13:54:14 To: "pandees waran"; Cc: "user@spark.apache.org"; Subject: Re: Spark parquet file read problem ! I checked and realised

Running several spark actions in parallel

2017-07-31 Thread Guy Harmach
Hi, I need to run a batch job written in Java that executes several SQL statements on different hive tables, and then process each partition result set in a foreachPartition() operator. I'd like to run these actions in parallel. I saw there are two approaches for achieving this: 1. Using