ALS memory limits

2014-03-25 Thread Debasish Das
Hi, For our usecases we are looking into 20 x 1M matrices which comes in the similar ranges as outlined by the paper over here: http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html Is the exponential runtime growth in spark ALS as outlined by the blog still exists in recommen

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 10:53 AM, Tathagata Das wrote: > PR 159 seems like a fairly big patch to me. And quite recent, so its impact > on the scheduling is not clear. It may also depend on other changes that > may have gotten into the DAGScheduler but not pulled into branch 0.9. I am > not sure it

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 11:04 AM, Kay Ousterhout wrote: > I don't think the blacklisting is a priority and the CPUS_PER_TASK issue > was still broken after this patch (so broken that I'm convinced no one > actually uses this feature!!), so agree with TD's sentiment that this > shouldn't go into 0.

Re: [VOTE] Release Apache Spark 0.9.1 (rc1)

2014-03-25 Thread Matei Zaharia
Actually I found one minor issue, which is that the support for Tachyon in make-distribution.sh seems to rely on GNU sed flags and doesn’t work on Mac OS X. But I’d be okay pushing that to a later release since this is a packaging operation that you do only once, and presumably you’d do it on a

Re: Spark 0.9.1 release

2014-03-25 Thread Kay Ousterhout
I don't think the blacklisting is a priority and the CPUS_PER_TASK issue was still broken after this patch (so broken that I'm convinced no one actually uses this feature!!), so agree with TD's sentiment that this shouldn't go into 0.9.1. On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das wrote: >

Re: [VOTE] Release Apache Spark 0.9.1 (rc1)

2014-03-25 Thread Matei Zaharia
+1 looks good to me. I tried both the source and CDH4 versions and looked at the new streaming docs. The release notes seem slightly incomplete, but I guess you’re still working on them? Anyway those don’t go into the release tarball so it’s okay. Matei On Mar 24, 2014, at 2:01 PM, Tathagata D

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
PR 159 seems like a fairly big patch to me. And quite recent, so its impact on the scheduling is not clear. It may also depend on other changes that may have gotten into the DAGScheduler but not pulled into branch 0.9. I am not sure it is a good idea to pull that in. We can pull those changes later

Re: Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Prashant Sharma
I think we should upgrade sbt, I have been using sbt since 13.2-M1 and have not spotted any issues. So RC1 should be good + it has the fast incremental compilation. Prashant Sharma On Wed, Mar 26, 2014 at 10:41 AM, Will Benton wrote: > - Original Message - > > > At last, I worked aroun

Re: Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Will Benton
- Original Message - > At last, I worked around this issue by updating my local SBT to 0.13.2-RC1. > If any of you are experiencing similar problem, I suggest you upgrade your > local SBT version. If this issue is causing grief for anyone on Fedora 20, know that you can install sbt via y

Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Cheng Lian
Hi all, Due to a bug of Ivy, SBT tries to download .orbit instead of .jar files and causing problems. This bug has been fixed in Ivy 2.3.0, but SBT 0.13.1 still uses Ivy 2.0. Aaron had kindly provided a workaround in PR #183

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
Forgot to mention this in the earlier request for PR's. If there is another RC being cut, please add https://github.com/apache/spark/pull/159 to it too (if not done already !). Thanks, Mridul On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das wrote: > Hello everyone, > > Since the release of Spark

Re: Travis CI

2014-03-25 Thread Patrick Wendell
Ya It's been a little bit slow lately because of a high error rate in interactions with the git-hub API. Unfortunately we are pretty slammed for the release and haven't had a ton of time to do further debugging. - Patrick On Tue, Mar 25, 2014 at 7:13 PM, Nan Zhu wrote: > I just found that the Je

Re: Travis CI

2014-03-25 Thread Nan Zhu
I just found that the Jenkins is not working from this afternoon for one PR, the first time build failed after 90 minutes, the second time it has run for more than 2 hours, no result is returned Best, -- Nan Zhu On Tuesday, March 25, 2014 at 10:06 PM, Patrick Wendell wrote: > That's not c

Re: Travis CI

2014-03-25 Thread Patrick Wendell
That's not correct - like Michael said the Jenkins build remains the reference build for now. On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu wrote: > I assume the Jenkins is not working now? > > Best, > > -- > Nan Zhu > > > On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote: > > Just a quick

Re: Travis CI

2014-03-25 Thread Nan Zhu
I assume the Jenkins is not working now? Best, -- Nan Zhu On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote: > Just a quick note to everyone that Patrick and I are playing around with > Travis CI on the Spark github repository. For now, travis does not run all > of the test cas

Travis CI

2014-03-25 Thread Michael Armbrust
Just a quick note to everyone that Patrick and I are playing around with Travis CI on the Spark github repository. For now, travis does not run all of the test cases, so will only be turned on experimentally. Long term it looks like Travis might give better integration with github, so we are goin

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
@evan >From the discussion in the JIRA, it seems that we still dont have a clear solution for SPARK-1138. Nor do we have a sense of whether the solution is going to small enough for a maintenance release. So I dont think we should block the release of Spark 0.9.1 for this. We can make another Spark

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-25 Thread Gary Malouf
Can anyone verify the claims from Aureliano regarding the Akka dependency protobuf collision? Our team has a major need to upgrade to protobuf 2.5.0 up the pipe and Spark seems to be the blocker here. On Fri, Mar 21, 2014 at 6:49 PM, Aureliano Buendia wrote: > > > > On Tue, Mar 18, 2014 at 12:5

Re: Spark 0.9.1 release

2014-03-25 Thread Kevin Markey
TD: A correct shading of ASM should only affect Spark code unless someone is relying on ASM 4.0 in unrelated project code, in which case they can add org.ow2.asm:asm:4.x as a dependency. Our short term solution has been to repackage other libraries with a 3.2 dependency or to exclude ASM whe

Re: new Catalyst/SQL component merged into master

2014-03-25 Thread Evan Chan
HI Michael, It's not publicly available right now, though we can probably chat about it offline. It's not a super novel concept or anything, in fact I had proposed it a long time ago on the mailing lists. -Evan On Mon, Mar 24, 2014 at 1:34 PM, Michael Armbrust wrote: > Hi Evan, > > Index supp

Re: Spark 0.9.1 release

2014-03-25 Thread Evan Chan
Hey guys, I think SPARK-1138 should be resolved before releasing Spark 0.9.1. It's affecting multiple users ability to use Spark 0.9 with various versions of Hadoop. I have one fix but not sure if it works for others. -Evan On Mon, Mar 24, 2014 at 5:30 PM, Tathagata Das wrote: > Hello Kevin, >

Re: Shark does not give any results with SELECT count(*) command

2014-03-25 Thread qingyang li
spark is deloyed on bigdata001 bigdata002 bigdata003 bigdata004 bigdata001 is master i have also copied shark's files on the four machines. when i run " select count(*) from b " on bigdata003's shark shell "bin/shark" , i could get the result. but when i run "select count(*) from b" on other nodes