[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35968583 Hmm okay I'm still wondering if there is a path for in-memory data where two copies are created: Cache manager runs: ``` val elements = new Arra

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10022161 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -25,7 +25,22 @@ import org.apache.spark.SparkConf pr

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10022071 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManag

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967757 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967767 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12841/ ---

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967758 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967766 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967547 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967536 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967549 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12838/ --- If you

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35967538 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12836/ --- If you

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10021460 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManag

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35966452 @kellrott - ah I see, this has just moved the copy from one location to another... I retract my comment. --- If your project is set up for it, you can reply t

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10021400 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -65,17 +65,19 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35966227 Hey @kellrott - I started to do a review on this focused on the tests and smaller stuff. But I realized, this makes a fairly major change to the block manager A

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020928 --- Diff: core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020917 --- Diff: core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020880 --- Diff: core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020710 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -25,7 +25,22 @@ import org.apache.spark.SparkConf p

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020704 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -25,7 +25,22 @@ import org.apache.spark.SparkConf p

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10020702 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -25,7 +25,22 @@ import org.apache.spark.SparkConf p

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35964570 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35964569 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35963965 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35963966 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35963553 (`sbt/sbt scalastyle` runs its namesake, by the way) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as w

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread kellrott
Github user kellrott commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35961011 sbt/sbt assembly runs fine. But I haven't re-merged master since 0.9 was released (about 20 days...) --- If your project is set up for it, you can reply to thi

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960770 The style rules are defined in scalastyle-config.xml but I'm unsure how to run that test locally. Does it not fail when you compile with "sbt/sbt assemble"

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread kellrott
Github user kellrott commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960651 Thanks, I was trying to figure out what the heck that was all about. Must be a new code check they recently added. --- If your project is set up for it, you ca

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960555 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12835/ ---

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960554 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960478 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960479 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960492 The test that failed was a line-too-long error: error file=/root/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/serializer/JavaSer

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960041 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12833/ ---

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35960040 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35959961 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35959962 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10011961 --- Diff: core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

Re: SPARK-942 patch review

2014-02-24 Thread Matei Zaharia
>>>> Kyle identified a deficiency in Spark where generating iterators are >>>>> unrolled into memory and then flushed to disk rather than sent >> straight >>>>> >>>> >>>> to >>>>> disk when possible. >>>>> >>>>> He's had a patch sitting ready for code review for quite some time >> now >>>> (100 >>>>> days) but no response. >>>>> >>>>> Is this something that an admin would be able to review? I for one >> would >>>>> find this quite valuable. >>>>> >>>>> Thanks! >>>>> Andrew >>>>> >>>>> >>>>> https://spark-project.atlassian.net/browse/SPARK-942 >>>>> https://github.com/apache/incubator-spark/pull/180 >>>>> >>>> >>>> >>> >>> >>> >> >> >>

Re: SPARK-942 patch review

2014-02-24 Thread Andrew Ash
> > to > > > > disk when possible. > > > > > > > > He's had a patch sitting ready for code review for quite some time > now > > > (100 > > > > days) but no response. > > > > > > > > Is this something that an admin would be able to review? I for one > would > > > > find this quite valuable. > > > > > > > > Thanks! > > > > Andrew > > > > > > > > > > > > https://spark-project.atlassian.net/browse/SPARK-942 > > > > https://github.com/apache/incubator-spark/pull/180 > > > > > > > > > > > > > > > > > > >

Re: SPARK-942 patch review

2014-02-24 Thread Nan Zhu
#x27;s had a patch sitting ready for code review for quite some time now > > (100 > > > days) but no response. > > > > > > Is this something that an admin would be able to review? I for one would > > > find this quite valuable. > > > > > > Thanks! > > > Andrew > > > > > > > > > https://spark-project.atlassian.net/browse/SPARK-942 > > > https://github.com/apache/incubator-spark/pull/180 > > > > > > > > > >

Re: SPARK-942 patch review

2014-02-24 Thread Andrew Ash
ys) but no response. > > > > Is this something that an admin would be able to review? I for one would > > find this quite valuable. > > > > Thanks! > > Andrew > > > > > > https://spark-project.atlassian.net/browse/SPARK-942 > > https://github.com/apache/incubator-spark/pull/180 > > > > > > >

Re: SPARK-942 patch review

2014-02-24 Thread Nan Zhu
ew for quite some time now (100 > days) but no response. > > Is this something that an admin would be able to review? I for one would > find this quite valuable. > > Thanks! > Andrew > > > https://spark-project.atlassian.net/browse/SPARK-942 > https://github.com/apache/incubator-spark/pull/180 > >

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread ash211
Github user ash211 commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/180#discussion_r10010480 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager

SPARK-942 patch review

2014-02-24 Thread Andrew Ash
thing that an admin would be able to review? I for one would find this quite valuable. Thanks! Andrew https://spark-project.atlassian.net/browse/SPARK-942 https://github.com/apache/incubator-spark/pull/180

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-24 Thread kellrott
Github user kellrott commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35916887 Pull Request log, day 100. It’s been three months since I was submitted, and almost 2 months since I last heard from an admin. I pass all unit tests and fix a

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-16 Thread kellrott
Github user kellrott commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-35211241 Are there any other remaining issues that are preventing this pull request from being reviewed/merged? If your project is set up for it, you can reply to this

[GitHub] incubator-spark pull request: Patch for SPARK-942

2014-02-10 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/incubator-spark/pull/180#issuecomment-34660329 I often use Spark for ETL and the ability to stream directly onto (HDFS) disk would be valuable for me. No need to buffer in memory if we don't have to.

Re: SPARK-942

2013-11-14 Thread Evan Chan
+1 for IteratorWithSizeEstimate. I believe today only HadoopRDDs are able to give fine grained progress; with an enhanced iterator interface (which can still expose the base Iterator trait) we can extend the possibility of fine grained progress to all RDDs that implement the enhanced iterator. O

Re: SPARK-942

2013-11-13 Thread Aaron Davidson
could help me > > to review if you are free. > > > > -Original Message- > > From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu] > > Sent: Wednesday, November 13, 2013 8:44 AM > > To: dev@spark.incubator.apache.org > > Subject: Re: SPARK-942 > > > > I'

RE: SPARK-942

2013-11-12 Thread Kyle Ellrott
t [mailto:kellr...@soe.ucsc.edu] > Sent: Wednesday, November 13, 2013 8:44 AM > To: dev@spark.incubator.apache.org > Subject: Re: SPARK-942 > > I've posted a patch that I think produces the correct behavior at > > https://github.com/kellrott/incubator-spark/commit/efe1102

RE: SPARK-942

2013-11-12 Thread Xia, Junluan
Hi kely I also build a patch for this issue, and pass the test, you could help me to review if you are free. -Original Message- From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu] Sent: Wednesday, November 13, 2013 8:44 AM To: dev@spark.incubator.apache.org Subject: Re: SPARK-942 I&#x

Re: SPARK-942

2013-11-12 Thread Kyle Ellrott
I've posted a patch that I think produces the correct behavior at https://github.com/kellrott/incubator-spark/commit/efe1102c8a7436b2fe112d3bece9f35fedea0dc8 It works fine on my programs, but if I run the unit tests, I get errors like: [info] - large number of iterations *** FAILED *** [info] o

Re: SPARK-942

2013-11-12 Thread Alex Boisvert
On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman < stephen.haber...@gmail.com> wrote: > Huge disclaimer that this is probably a big pita to implement, and > could likely not be as worthwhile as I naively think it would be. > My perspective on this is it's already big pita of Spark users today.

Re: SPARK-942

2013-11-12 Thread Stephen Haberman
> The problem is that the iterator interface only defines 'hasNext' and > 'next' methods. Just a comment from the peanut gallery, but FWIW it seems like being able to ask "how much data is here" would be a useful thing for Spark to know, even if that means moving away from Iterator itself, or som

Re: SPARK-942

2013-11-12 Thread Koert Kuipers
to make code complicated for only 0.1% > possibility before we get perfect solution. > > -Original Message- > From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu] > Sent: Tuesday, November 12, 2013 6:28 AM > To: dev@spark.incubator.apache.org > Subject: Re: SPARK-942 > > The

RE: SPARK-942

2013-11-11 Thread Xia, Junluan
sage- From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu] Sent: Tuesday, November 12, 2013 6:28 AM To: dev@spark.incubator.apache.org Subject: Re: SPARK-942 The problem is that the iterator interface only defines 'hasNext' and 'next' methods. So I don't think that there

Re: SPARK-942

2013-11-11 Thread Kyle Ellrott
before it become Arraybuffer. > > Is there any solution for this case? Could we just unroll first 10% of > total iterator into ArrayBuffer, and estimate this size, and total size is > equal to 10* size of 10%? apparently it is not perfect. > > -Original Message- > From: Kyle

RE: SPARK-942

2013-11-11 Thread Xia, Junluan
Ellrott [mailto:kellr...@soe.ucsc.edu] Sent: Thursday, November 07, 2013 2:59 AM To: dev@spark.incubator.apache.org Subject: Re: SPARK-942 I think the usage has to be calculated as the iterator is being put into the arraybuffer. Right now, the BlockManager, in it's put method when it ge

Re: SPARK-942

2013-11-06 Thread Kyle Ellrott
t >wrote: > > > I was wondering if anybody had any thoughts on the best way to tackle > > SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ). > > Basically, Spark takes an iterator from a flatmap call and because I tell > > it that it needs to persist

Re: SPARK-942

2013-11-03 Thread Reynold Xin
you have any ideas on this one, Kyle? On Sat, Oct 26, 2013 at 10:53 AM, Kyle Ellrott wrote: > I was wondering if anybody had any thoughts on the best way to tackle > SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ). > Basically, Spark takes an iterator from a flatmap

SPARK-942

2013-10-26 Thread Kyle Ellrott
I was wondering if anybody had any thoughts on the best way to tackle SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ). Basically, Spark takes an iterator from a flatmap call and because I tell it that it needs to persist Spark proceeds to push it all into an array before