Re: GroupByKey implementation.

2014-01-26 Thread Archit Thakur
Thanks Mark, Reynold for the quick response. On Mon, Jan 27, 2014 at 5:07 AM, Reynold Xin wrote: > While I echo Mark's sentiment, versioning has nothing to do with this > problem. It has been the case even in Spark 0.8.0. > > Note that mapSideCombine is turned off for groupByKey, so there is no

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Sebastian Schelter
Hi Taka, please write a new mail for your problem and don't reply to an existing (unrelated) thread: https://people.apache.org/~hossman/#threadhijack On 01/27/2014 07:49 AM, Taka Shinagawa wrote: If I build Spark for Hadoop 1.0.4 (either "SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly" or "sb

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Taka Shinagawa
>> And I'm seeing the errors when I build Spark first time after downloading and extracting spark-0.9.0-incubating.tgz A little clarification... I see the errors during sbt test after building Spark, not durin

[RESULT] [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Patrick Wendell
Voting is now closed. This vote passes with 5 binding +1 votes and no 0 or -1 votes. This vote will now go to the IPMC list for a second 72-hour vote. Spark developers are encouraged to comment on the IPMC vote as well. The totals are: +1 Patrick Wendell* Hossein Falaki Reynold Xin* Andy Konwinsk

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Taka Shinagawa
I'm always running sbt clean before building Spark. And I'm seeing the errors when I build Spark first time after downloading and extracting spark-0.9.0-incubating.tgz Just in case, I deleted the test.jar fil

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Patrick Wendell
Hey Taka, If you build a second version you need to clean the existing assembly jar. The reference implementation of the tests are the ones on the U.C. Berkeley Jenkins. These are passing for Branch 0.9 for both Hadoop 1 and Hadoop 2 versions, so I'm inclined to think it's an issue with your test

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Reynold Xin
It is possible that you have generated the assembly jar using one version of Hadoop, and then another assembly jar with another version. Those tests that failed are all using a local cluster that sets up multiple processes, which would require launching Spark worker processes using the assembly jar

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Taka Shinagawa
If I build Spark for Hadoop 1.0.4 (either "SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly" or "sbt/sbt assembly") or use the binary distribution, 'sbt/sbt test' runs successfully. However, if I build Spark targeting any other Hadoop versions (e.g. "SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly", "SPAR

RE: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Shao, Saisai
+1! Thanks Jerry -Original Message- From: prabeesh k [mailto:prabsma...@gmail.com] Sent: Monday, January 27, 2014 2:25 PM To: dev@spark.incubator.apache.org Subject: Re: [VOTE] Graduation of Apache Spark +1 On Mon, Jan 27, 2014 at 10:59 AM, Prashant Sharma wrote: > +1 > > > On Mon, J

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread prabeesh k
+1 On Mon, Jan 27, 2014 at 10:59 AM, Prashant Sharma wrote: > +1 > > > On Mon, Jan 27, 2014 at 5:44 AM, Tathagata Das > wrote: > > > +1 > > > > > > On Sun, Jan 26, 2014 at 3:31 PM, Konstantin Boudnik > > wrote: > > > > > +1 > > > > > > On Sun, Jan 26, 2014 at 01:49PM, Matei Zaharia wrote: > > >

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Stephen Haberman
+1 - Stephen

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Prashant Sharma
+1 On Mon, Jan 27, 2014 at 5:44 AM, Tathagata Das wrote: > +1 > > > On Sun, Jan 26, 2014 at 3:31 PM, Konstantin Boudnik > wrote: > > > +1 > > > > On Sun, Jan 26, 2014 at 01:49PM, Matei Zaharia wrote: > > > Hi guys, > > > > > > Discussion has proceeded positively, so I'm calling for a community

Re: [DISCUSS] [VOTE] Graduation of Apache Spark

2014-01-26 Thread Chris Mattmann
Thanks, Matei. Cheers, Chris -Original Message- From: Matei Zaharia Reply-To: "dev@spark.incubator.apache.org" Date: Sunday, January 26, 2014 5:42 PM To: "dev@spark.incubator.apache.org" Subject: Re: [DISCUSS] [VOTE] Graduation of Apache Spark >Hey Chris, this is a good point ― I did

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Rahul Chugh
+1 > On Jan 26, 2014, at 16:14, Tathagata Das wrote: > > +1 > > >> On Sun, Jan 26, 2014 at 3:31 PM, Konstantin Boudnik wrote: >> >> +1 >> >>> On Sun, Jan 26, 2014 at 01:49PM, Matei Zaharia wrote: >>> Hi guys, >>> >>> Discussion has proceeded positively, so I'm calling for a community VOTE

Re: [DISCUSS] [VOTE] Graduation of Apache Spark

2014-01-26 Thread Matei Zaharia
Hey Chris, this is a good point — I didn’t realize that the mentors would have other roles later in interacting with the ASF, I thought they were just focused on the incubation process. In that case I think I’d propose we add you, for example, once the project is up. For simplicity I don’t think

[DISCUSS] [VOTE] Graduation of Apache Spark

2014-01-26 Thread Chris Mattmann
Hi Matei, One thing I noticed that you did was to drop the mentors and Champion off of the PMC constituency. While I don't personally take any offense to that, I think you will want to keep at least a few of your mentors around that are ASF members. This has the dual benefit of: 1. Making sure th

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Tathagata Das
+1 On Sun, Jan 26, 2014 at 3:31 PM, Konstantin Boudnik wrote: > +1 > > On Sun, Jan 26, 2014 at 01:49PM, Matei Zaharia wrote: > > Hi guys, > > > > Discussion has proceeded positively, so I'm calling for a community VOTE > for the graduation of Apache Spark (incubating) into a top level project.

Re: GroupByKey implementation.

2014-01-26 Thread Reynold Xin
While I echo Mark's sentiment, versioning has nothing to do with this problem. It has been the case even in Spark 0.8.0. Note that mapSideCombine is turned off for groupByKey, so there is no need to merge any combiners. On Sun, Jan 26, 2014 at 12:22 PM, Archit Thakur wrote: > Hi, > > Below is t

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Konstantin Boudnik
+1 On Sun, Jan 26, 2014 at 01:49PM, Matei Zaharia wrote: > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE for > the graduation of Apache Spark (incubating) into a top level project. If this > VOTE is successful, then I'll call an Incubator PMC VOTE in 72 h

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Xuefeng Wu
+1 Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年1月27日, at 上午5:49, Matei Zaharia wrote: > > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE for > the graduation of Apache Spark (incubating) into a top level project. If this > VOTE is successful, then I'll call an

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Evan Sparks
+1 > On Jan 26, 2014, at 2:52 PM, Kay Ousterhout wrote: > > +1 > > > On Sun, Jan 26, 2014 at 2:33 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> +1 >> >> On Sun, Jan 26, 2014 at 2:24 PM, Aaron Davidson >> wrote: >>> +1 On Jan 26, 2014 2:11 PM, "Christopher Nguyen

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Kay Ousterhout
+1 On Sun, Jan 26, 2014 at 2:33 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > +1 > > On Sun, Jan 26, 2014 at 2:24 PM, Aaron Davidson > wrote: > > +1 > > On Jan 26, 2014 2:11 PM, "Christopher Nguyen" wrote: > > > >> +1 > >> > >> Sent while mobile. Pls excuse typos etc. > >> O

Re: GroupByKey implementation.

2014-01-26 Thread Mark Hamstra
That was run on 0.8.0-incubating ...which raises a question that has been recurring to me of late: Why are people continuing to use 0.8.0 months after 0.8.1 has been out and when 0.9.0 is in release candidates? It doesn't make a relevant difference in this case, but in general, chasing bugs in cod

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Shivaram Venkataraman
+1 On Sun, Jan 26, 2014 at 2:24 PM, Aaron Davidson wrote: > +1 > On Jan 26, 2014 2:11 PM, "Christopher Nguyen" wrote: > >> +1 >> >> Sent while mobile. Pls excuse typos etc. >> On Jan 26, 2014 1:50 PM, "Matei Zaharia" wrote: >> >> > Hi guys, >> > >> > Discussion has proceeded positively, so I'm

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Aaron Davidson
+1 On Jan 26, 2014 2:11 PM, "Christopher Nguyen" wrote: > +1 > > Sent while mobile. Pls excuse typos etc. > On Jan 26, 2014 1:50 PM, "Matei Zaharia" wrote: > > > Hi guys, > > > > Discussion has proceeded positively, so I'm calling for a community VOTE > > for the graduation of Apache Spark (incu

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Christopher Nguyen
+1 Sent while mobile. Pls excuse typos etc. On Jan 26, 2014 1:50 PM, "Matei Zaharia" wrote: > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE > for the graduation of Apache Spark (incubating) into a top level project. > If this VOTE is successful, then I'll

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Kostas Sakellis
+1! On Sunday, January 26, 2014, Sebastian Schelter wrote: > +1 (I'm also on the IPMC) > > On 01/26/2014 11:05 PM, Sandy Ryza wrote: > > +1! > > > On Sun, Jan 26, 2014 at 1:49 PM, Matei Zaharia >wrote: > > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE >

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Sebastian Schelter
+1 (I'm also on the IPMC) On 01/26/2014 11:05 PM, Sandy Ryza wrote: +1! On Sun, Jan 26, 2014 at 1:49 PM, Matei Zaharia wrote: Hi guys, Discussion has proceeded positively, so I'm calling for a community VOTE for the graduation of Apache Spark (incubating) into a top level project. If this V

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Sandy Ryza
+1! On Sun, Jan 26, 2014 at 1:49 PM, Matei Zaharia wrote: > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE > for the graduation of Apache Spark (incubating) into a top level project. > If this VOTE is successful, then I'll call an Incubator PMC VOTE in 72 >

[VOTE] Graduation of Apache Spark

2014-01-26 Thread Matei Zaharia
Hi guys, Discussion has proceeded positively, so I'm calling for a community VOTE for the graduation of Apache Spark (incubating) into a top level project. If this VOTE is successful, then I'll call an Incubator PMC VOTE in 72 hours, and if that is successful, we’ll submit the project graduatio

Re: GroupByKey implementation.

2014-01-26 Thread Archit Thakur
Which spark version are you on? On Mon, Jan 27, 2014 at 3:12 AM, Mark Hamstra wrote: > groupByKey does merge the values associated with the same key in different > partitions: > > scala> val rdd = sc.parallelize(List(1, 1, 1, 1), > 4).mapPartitionsWithIndex((idx, itr) => List(("foo", idx -> > ma

Re: GroupByKey implementation.

2014-01-26 Thread Mark Hamstra
groupByKey does merge the values associated with the same key in different partitions: scala> val rdd = sc.parallelize(List(1, 1, 1, 1), 4).mapPartitionsWithIndex((idx, itr) => List(("foo", idx -> math.random),("bar", idx -> math.random)).toIterator) scala> rdd.collect.foreach(println) (foo,(0,0

GroupByKey implementation.

2014-01-26 Thread Archit Thakur
Hi, Below is the implementation for GroupByKey. (v, 0.8.0) def groupByKey(partitioner: Partitioner): RDD[(K, Seq[V])] = { def createCombiner(v: V) = ArrayBuffer(v) def mergeValue(buf: ArrayBuffer[V], v: V) = buf += v val bufs = combineByKey[ArrayBuffer[V]]( createCombiner _, me

Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-26 Thread Sean Owen
I think "it depends" a fair bit here. That's a good default absolute convergence cutoff, although it's not crazy to want to run to further convergence since +/- 0.001 can make a difference in top-N recommendations that is noticeable, and it can seem weird that it's 'converged' while answers are no

RE: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-26 Thread Xia, Junluan
Yes, I agree with Matei, but I think that it is meaningful to change visit() function to non-recursive and avoid Stackoverflow in driver. -Original Message- From: Nick Pentreath [mailto:nick.pentre...@gmail.com] Sent: Sunday, January 26, 2014 4:24 PM To: dev@spark.incubator.apache.org Su

Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-26 Thread Nick Pentreath
If you want to spend the time running 50 iterations, you're better off re-running 5x10 iterations with different random start to get a better local minimum...— Sent from Mailbox for iPhone On Sun, Jan 26, 2014 at 9:59 AM, Matei Zaharia wrote: > I looked into this after I opened that JIRA and i

Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-26 Thread Nick Pentreath
Agree that it should be fixed if possible. But why run ALS for 50 iterations? It tends to pretty much converge (to within 0.001 or so RMSE) after 5-10 and even 20 is probably overkill.— Sent from Mailbox for iPhone On Sun, Jan 26, 2014 at 9:59 AM, Matei Zaharia wrote: > I looked into this afte