Re: (test)

2014-05-16 Thread Aaron Davidson
No. Only 3 of the responses. On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi wrote: > Yes. > > > On Fri, May 16, 2014 at 8:40 AM, DB Tsai wrote: > > > Yes. > > On May 16, 2014 8:39 AM, "Andrew Or" wrote: > > > > > Apache has been having some problems lately. Do you guys see this > > message? >

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Xiangrui Meng
With 3x replication, we should be able to achieve fault tolerance. This checkPointed RDD can be cleared if we have another in-memory checkPointed RDD down the line. It can avoid hitting disk if we have enough memory to use. We need to investigate more to find a good solution. -Xiangrui On Fri, May

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Michael Armbrust
-1 We found a regression in the way configuration is passed to executors. https://issues.apache.org/jira/browse/SPARK-1864 https://github.com/apache/spark/pull/808 Michael On Fri, May 16, 2014 at 3:57 PM, Mark Hamstra wrote: > +1 > > > On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell > wrote

Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Mridul Muralidharan
So was rc5 cancelled ? Did not see a note indicating that or why ... [1] - Mridul [1] could have easily missed it in the email storm though ! On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This pa

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Henry Saputra
Ah ok, thanks Aaron Just to make sure we VOTE the right RC. Thanks, Henry On Fri, May 16, 2014 at 11:37 AM, Aaron Davidson wrote: > It was, but due to the apache infra issues, some may not have received the > email yet... > > On Fri, May 16, 2014 at 10:48 AM, Henry Saputra > wrote: >> >> Hi P

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Mridul Muralidharan
Effectively this is persist without fault tolerance. Failure of any node means complete lack of fault tolerance. I would be very skeptical of truncating lineage if it is not reliable. On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" wrote: > Xiangrui Meng created SPARK-1855: >

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Mark Hamstra
+1 On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell wrote: > [Due to ASF e-mail outage, I'm not if anyone will actually receive this.] > > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > This has only minor changes on top of rc7. > > The tag to be voted on is

Re: mllib vector templates

2014-05-16 Thread Xiangrui Meng
3) It is not designed for dense feature vectors. On Thu, May 15, 2014 at 8:33 PM, Xiangrui Meng wrote: > I submitted a PR for standardizing the text format for vectors and > labeled data: https://github.com/apache/spark/pull/685 > > Once it gets merged, saveAsTextFile and loading should be consis

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Hey all, My vote threads seem to be running about 24 hours behind and/or getting swallowed by infra e-mail. I sent RC8 yesterday and we might send one tonight as well. I'll make sure to close all existing ones There have been only small "polish" changes in the recent RC's since RC5. So testing a

Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Tom Graves
Yes, rc5 and rc6 were cancelled. There is now an rc7.   Unfortunately the Apache mailing list issue has caused lots of emails not to come through. Here is the details (hopefully it goes through): Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has mino

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
Thanks for your feedback. Since it's not a regression, it won't block the release. On Wed, May 14, 2014 at 12:17 AM, witgo wrote: > SPARK-1817 will cause users to get incorrect results and RDD.zip is common > usage . > This should be the highest priority. I think we should fix the bug,and shoul

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Mark Hamstra
+1, but just barely. We've got quite a number of outstanding bugs identified, and many of them have fixes in progress. I'd hate to see those efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 -- in other words, I'd like to see 1.0.1 retain a high priority relative to 1.1.0.

[VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This patch has minor documentation changes and fixes on top of rc6. The tag to be voted on is v1.0.0-rc7 (commit 9212b3e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9212b3e5bb5545ccfce242da8d89108

Calling external classes added by sc.addJar needs to be through reflection

2014-05-16 Thread DB Tsai
Finally find a way out of the ClassLoader maze! It took me some times to understand how it works; I think it worths to document it in a separated thread. We're trying to add external utility.jar which contains CSVRecordParser, and we added the jar to executors through sc.addJar APIs. If the insta

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Mark Hamstra
Sorry for the duplication, but I think this is the current VOTE candidate -- we're not voting on rc8 yet? +1, but just barely. We've got quite a number of outstanding bugs identified, and many of them have fixes in progress. I'd hate to see those efforts get lost in a post-1.0.0 flood of new fea

[VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-16 Thread Patrick Wendell
[Due to ASF e-mail outage, I'm not if anyone will actually receive this.] Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has only minor changes on top of rc7. The tag to be voted on is v1.0.0-rc8 (commit 80eea0f): https://git-wip-us.apache.org/repos/asf?p=spa

Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread GlennStrycker
Why does the reduce function only work on sums of keys of the same type and does not support other functional forms? I am having trouble in another example where instead of 1s and 0s, the output of the map function is something like A=(1,2) and B=(3,4). I need a reduce function that can return so

Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Mark Hamstra
Sorry, looks like an extra line got inserted in there. One more try: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ => val x = Math.random() val y = Math.random() if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra wrote: > Actually, the b

reduce only removes duplicates, cannot be arbitrary function

2014-05-16 Thread GlennStrycker
I am attempting to write a mapreduce job on a graph object to take an edge list and return a new edge list. Unfortunately I find that the current function is def reduce(f: (T, T) => T): T not def reduce(f: (T1, T2) => T3): T I see this because the following 2 commands give different results f

Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Mark Hamstra
Actually, the better way to write the multi-line closure would be: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ => val x = Math.random() val y = Math.random() if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker wrote: > On the webpage http

Re: Scala examples for Spark do not work as written in documentation

2014-05-16 Thread Reynold Xin
Thanks for pointing it out. We should update the website to fix the code. val count = spark.parallelize(1 to NUM_SAMPLES).map { i => val x = Math.random() val y = Math.random() if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / NUM_SAMPLES) On Fri, May 16

Re: (test)

2014-05-16 Thread Reynold Xin
I didn't see the original message, but only a reply. On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi wrote: > Yes. > > > On Fri, May 16, 2014 at 8:40 AM, DB Tsai wrote: > > > Yes. > > On May 16, 2014 8:39 AM, "Andrew Or" wrote: > > > > > Apache has been having some problems lately. Do you guys

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Patrick Wendell
Hey Everyone, Just a heads up - I've sent other release candidates to the list, but they appear to be getting swallowed (i.e. they are not on nabble). I think there is an issue with Apache mail servers. I'm going to keep trying... if you get duplicate e-mails I apologize in advance. On Thu, May

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Aaron Davidson
It was, but due to the apache infra issues, some may not have received the email yet... On Fri, May 16, 2014 at 10:48 AM, Henry Saputra wrote: > Hi Patrick, > > Just want to make sure that VOTE for rc6 also cancelled? > > > Thanks, > > Henry > > On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell >

Scala examples for Spark do not work as written in documentation

2014-05-16 Thread GlennStrycker
On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i => val x = Math.random() val y = Math.random() if (x*x + y*y < 1) 1 else 0 ).reduce(_ + _) println("Pi is roughly " + 4.0 * count / NUM_SAMPLES) This do

Re: (test)

2014-05-16 Thread Nishkam Ravi
Yes. On Fri, May 16, 2014 at 8:40 AM, DB Tsai wrote: > Yes. > On May 16, 2014 8:39 AM, "Andrew Or" wrote: > > > Apache has been having some problems lately. Do you guys see this > message? > > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Henry Saputra
Hi Patrick, Just want to make sure that VOTE for rc6 also cancelled? Thanks, Henry On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell wrote: > I'll start the voting with a +1. > > On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell wrote: >> Please vote on releasing the following candidate as Apa

Re: (test)

2014-05-16 Thread DB Tsai
Yes. On May 16, 2014 8:39 AM, "Andrew Or" wrote: > Apache has been having some problems lately. Do you guys see this message? >

Re: (test)

2014-05-16 Thread Ted Yu
Yes. On Thu, May 15, 2014 at 10:34 AM, Andrew Or wrote: > Apache has been having some problems lately. Do you guys see this message? >

[RESULT][VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Patrick Wendell
This vote is cancelled in favor of rc7. On Wed, May 14, 2014 at 1:02 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This patch has a few minor fixes on top of rc5. I've also built the > binary artifacts with Hive support enabled

Re: mllib vector templates

2014-05-16 Thread Xiangrui Meng
I submitted a PR for standardizing the text format for vectors and labeled data: https://github.com/apache/spark/pull/685 Once it gets merged, saveAsTextFile and loading should be consistent. I didn't choose LibSVM as the default format because two reasons: 1) It doesn't contain feature dimension

(test)

2014-05-16 Thread Andrew Or
Apache has been having some problems lately. Do you guys see this message?

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Patrick Wendell
I'll start the voting with a +1. On Thu, May 15, 2014 at 1:14 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This patch has minor documentation changes and fixes on top of rc6. > > The tag to be voted on is v1.0.0-rc7 (commit 92

can RDD be shared across mutil spark applications?

2014-05-16 Thread qingyang li

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-16 Thread Henry Saputra
HI Sandy, Just curious if the Vote is for rc5 or rc6? Gmail shows me that you replied to the rc5 thread. Thanks, - Henry On Wed, May 14, 2014 at 1:28 PM, Sandy Ryza wrote: > +1 (non-binding) > > * Built the release from source. > * Compiled Java and Scala apps that interact with HDFS against i