Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-24 Thread Matt Cheah
-1 because of SPARK-16181 which is a correctness regression from 1.6. Looks like the patch is ready though: https://github.com/apache/spark/pull/13884 – it would be ideal for this patch to make it into the release. -Matt Cheah From: Nick Pentreath

Re: Jar for Spark developement

2016-06-24 Thread joshuata
With regards to the Spark JAR files, I have had really good success with the sbt plugin . You can set the desired spark version along with any plugins, and it will automatically fetch your dependencies and put them on the classpath. -- View

Re: Associating user objects with SparkContext/SparkStreamingContext

2016-06-24 Thread Evan Sparks
I would actually think about this the other way around. Move the functions you are passing to the streaming jobs out to their own object if possible. Spark's closure capture rules are necessarily far reaching and serialize the object that contains these methods, which is a common cause of the

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-24 Thread Nick Pentreath
I'm getting the following when trying to run ./dev/run-tests (not happening on master) from the extracted source tar. Anyone else seeing this? error: Could not access 'fc0a1475ef' ** File "./dev/run-tests.py", line 69, in

Associating user objects with SparkContext/SparkStreamingContext

2016-06-24 Thread Simon Scott
Hi, I am developing a streaming application using checkpointing on Spark 1.5.1 I have just run into a NotSerializableException because some of the state that my streaming functions need cannot be serialized. This state is only used in the driver process, it is the checkpointing that requires

Re: Partitioning in spark

2016-06-24 Thread Darshan Singh
Thanks but the whole point is not setting it explicitly but it should be derived from its parent RDDS. Thanks On Fri, Jun 24, 2016 at 6:09 AM, ayan guha wrote: > You can change paralllism like following: > > conf = SparkConf() > conf.set('spark.sql.shuffle.partitions',10)