Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
Yeah I can see that being the case -- caching implies creating objects that will be stored in memory. So there is a trade-off between storing data in memory but having to garbage collect it later vs. recomputing the data. Shivaram On Fri, Jul 10, 2015 at 9:49 PM, Ulanov, Alexander wrote: > Hi S

Re: Model parallelism with RDD

2015-07-10 Thread Ulanov, Alexander
Hi Shivaram, Thank you for suggestion! If I do .cache and .count, each iteration take much more time, which is spent in GC. Is it normal? 10 июля 2015 г., в 21:23, Shivaram Venkataraman mailto:shiva...@eecs.berkeley.edu>> написал(а): I think you need to do `newRDD.cache()` and `newRDD.count` b

Foundation policy on releases and Spark nightly builds

2015-07-10 Thread Sean Busbey
Hi Folks! I noticed that Spark website's download page lists nightly builds and instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy on releases expressly forbids this kind of publishing outside of the dev@spark community[2]. If you'd like to discuss having the policy updated (i

Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
I think you need to do `newRDD.cache()` and `newRDD.count` before you do oldRDD.unpersist(true) -- Otherwise it might be recomputing all the previous iterations each time. Thanks Shivaram On Fri, Jul 10, 2015 at 7:44 PM, Ulanov, Alexander wrote: > Hi, > > > > I am interested how scalable can b

Model parallelism with RDD

2015-07-10 Thread Ulanov, Alexander
Hi, I am interested how scalable can be the model parallelism within Spark. Suppose, the model contains N weights of type Double and N is so large that does not fit into the memory of a single node. So, we can store the model in RDD[Double] within several nodes. To train the model, one needs to

Re: language-independent RDD Spark core code?

2015-07-10 Thread Vasili I. Galchin
think I found this RDD code On Fri, Jul 10, 2015 at 7:00 PM, Vasili I. Galchin wrote: > I am looking at R side, but curious what the RDD core side looks like. > Not sure which directory to look inside. ?? > > Thanks, > > Vasili

language-independent RDD Spark core code?

2015-07-10 Thread Vasili I. Galchin
I am looking at R side, but curious what the RDD core side looks like. Not sure which directory to look inside. ?? Thanks, Vasili - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h.

Re: PySpark vs R

2015-07-10 Thread Shivaram Venkataraman
The R and Python implementations differ in how they communicate with the JVM so there is no invariant there per-se. Thanks Shivaram On Thu, Jul 9, 2015 at 10:40 PM, Vasili I. Galchin wrote: > Hello, > > Just trying to get up to speed ( a week .. pls be patient with me). > > I have been

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-10 Thread Sean McNamara
+1 Sean > On Jul 8, 2015, at 11:55 PM, Patrick Wendell wrote: > > Please vote on releasing the following candidate as Apache Spark version > 1.4.1! > > This release fixes a handful of known issues in Spark 1.4.0, listed here: > http://s.apache.org/spark-1.4.1 > > The tag to be voted on is v1

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-10 Thread Denny Lee
+1 (non binding) On Fri, Jul 10, 2015 at 9:12 PM Tom Graves wrote: > +1 > > Tom > > > > On Thursday, July 9, 2015 12:55 AM, Patrick Wendell > wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 1.4.1! > > This release fixes a handful of known issues in Spar

Re: Questions about Fault tolerance of Spark

2015-07-10 Thread MIKE HYNES
Gentle bump on this topic; how to test the fault tolerance and previous benchmark results are both things we are interested in as well.  Mike Original message From: 牛兆捷 Date:07-09-2015 04:19 (GMT-05:00) To: dev@spark.apache.org, u...@spark.apache.org Subject: Questions abou

Re: The latest master branch didn't compile with -Phive?

2015-07-10 Thread Ted Yu
Compilation on master branch has been fixed. Thanks to Cheng Lian. On Thu, Jul 9, 2015 at 8:50 AM, Josh Rosen wrote: > Jenkins runs compile-only builds for Maven as an early warning system for > this type of issue; you can see from > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-10 Thread Tom Graves
+1 Tom On Thursday, July 9, 2015 12:55 AM, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.