Github user pwendell commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35968583
Hmm okay I'm still wondering if there is a path for in-memory data where
two copies are created:
Cache manager runs:
```
val elements = new Arra
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10022161
--- Diff:
core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala ---
@@ -25,7 +25,22 @@ import org.apache.spark.SparkConf
pr
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10022071
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager:
BlockManag
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967757
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967767
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12841/
---
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967758
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967766
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967547
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967536
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967549
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12838/
---
If you
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35967538
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12836/
---
If you
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10021460
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager:
BlockManag
Github user pwendell commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35966452
@kellrott - ah I see, this has just moved the copy from one location to
another... I retract my comment.
---
If your project is set up for it, you can reply t
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10021400
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -65,17 +65,19 @@ private class MemoryStore(blockManager: BlockManager
Github user pwendell commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35966227
Hey @kellrott - I started to do a review on this focused on the tests and
smaller stuff. But I realized, this makes a fairly major change to the block
manager A
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020928
--- Diff:
core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020917
--- Diff:
core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020880
--- Diff:
core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020710
--- Diff:
core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala ---
@@ -25,7 +25,22 @@ import org.apache.spark.SparkConf
p
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020704
--- Diff:
core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala ---
@@ -25,7 +25,22 @@ import org.apache.spark.SparkConf
p
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10020702
--- Diff:
core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala ---
@@ -25,7 +25,22 @@ import org.apache.spark.SparkConf
p
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35964570
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35964569
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35963965
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35963966
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user aarondav commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35963553
(`sbt/sbt scalastyle` runs its namesake, by the way)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as w
Github user kellrott commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35961011
sbt/sbt assembly runs fine. But I haven't re-merged master since 0.9 was
released (about 20 days...)
---
If your project is set up for it, you can reply to thi
Github user ash211 commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960770
The style rules are defined in scalastyle-config.xml but I'm unsure how to
run that test locally. Does it not fail when you compile with "sbt/sbt
assemble"
Github user kellrott commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960651
Thanks, I was trying to figure out what the heck that was all about. Must
be a new code check they recently added.
---
If your project is set up for it, you ca
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960555
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12835/
---
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960554
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960478
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960479
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user ash211 commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960492
The test that failed was a line-too-long error:
error
file=/root/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/serializer/JavaSer
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960041
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12833/
---
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35960040
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35959961
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35959962
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10011961
--- Diff:
core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software
>>>> Kyle identified a deficiency in Spark where generating iterators are
>>>>> unrolled into memory and then flushed to disk rather than sent
>> straight
>>>>>
>>>>
>>>> to
>>>>> disk when possible.
>>>>>
>>>>> He's had a patch sitting ready for code review for quite some time
>> now
>>>> (100
>>>>> days) but no response.
>>>>>
>>>>> Is this something that an admin would be able to review? I for one
>> would
>>>>> find this quite valuable.
>>>>>
>>>>> Thanks!
>>>>> Andrew
>>>>>
>>>>>
>>>>> https://spark-project.atlassian.net/browse/SPARK-942
>>>>> https://github.com/apache/incubator-spark/pull/180
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
> > to
> > > > disk when possible.
> > > >
> > > > He's had a patch sitting ready for code review for quite some time
> now
> > > (100
> > > > days) but no response.
> > > >
> > > > Is this something that an admin would be able to review? I for one
> would
> > > > find this quite valuable.
> > > >
> > > > Thanks!
> > > > Andrew
> > > >
> > > >
> > > > https://spark-project.atlassian.net/browse/SPARK-942
> > > > https://github.com/apache/incubator-spark/pull/180
> > > >
> > >
> > >
> >
> >
> >
>
>
>
#x27;s had a patch sitting ready for code review for quite some time now
> > (100
> > > days) but no response.
> > >
> > > Is this something that an admin would be able to review? I for one would
> > > find this quite valuable.
> > >
> > > Thanks!
> > > Andrew
> > >
> > >
> > > https://spark-project.atlassian.net/browse/SPARK-942
> > > https://github.com/apache/incubator-spark/pull/180
> > >
> >
> >
>
>
>
ys) but no response.
> >
> > Is this something that an admin would be able to review? I for one would
> > find this quite valuable.
> >
> > Thanks!
> > Andrew
> >
> >
> > https://spark-project.atlassian.net/browse/SPARK-942
> > https://github.com/apache/incubator-spark/pull/180
> >
> >
>
>
>
ew for quite some time now (100
> days) but no response.
>
> Is this something that an admin would be able to review? I for one would
> find this quite valuable.
>
> Thanks!
> Andrew
>
>
> https://spark-project.atlassian.net/browse/SPARK-942
> https://github.com/apache/incubator-spark/pull/180
>
>
Github user ash211 commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/180#discussion_r10010480
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager:
BlockManager
thing that an admin would be able to review? I for one would
find this quite valuable.
Thanks!
Andrew
https://spark-project.atlassian.net/browse/SPARK-942
https://github.com/apache/incubator-spark/pull/180
Github user kellrott commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35916887
Pull Request log, day 100. Itâs been three months since I was submitted,
and almost 2 months since I last heard from an admin. I pass all unit tests and
fix a
Github user kellrott commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-35211241
Are there any other remaining issues that are preventing this pull request
from being reviewed/merged?
If your project is set up for it, you can reply to this
Github user ash211 commented on the pull request:
https://github.com/apache/incubator-spark/pull/180#issuecomment-34660329
I often use Spark for ETL and the ability to stream directly onto (HDFS)
disk would be valuable for me. No need to buffer in memory if we don't have to.
+1 for IteratorWithSizeEstimate.
I believe today only HadoopRDDs are able to give fine grained
progress; with an enhanced iterator interface (which can still expose
the base Iterator trait) we can extend the possibility of fine grained
progress to all RDDs that implement the enhanced iterator.
O
could help me
> > to review if you are free.
> >
> > -Original Message-
> > From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu]
> > Sent: Wednesday, November 13, 2013 8:44 AM
> > To: dev@spark.incubator.apache.org
> > Subject: Re: SPARK-942
> >
> > I'
t [mailto:kellr...@soe.ucsc.edu]
> Sent: Wednesday, November 13, 2013 8:44 AM
> To: dev@spark.incubator.apache.org
> Subject: Re: SPARK-942
>
> I've posted a patch that I think produces the correct behavior at
>
> https://github.com/kellrott/incubator-spark/commit/efe1102
Hi kely
I also build a patch for this issue, and pass the test, you could help me to
review if you are free.
-Original Message-
From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu]
Sent: Wednesday, November 13, 2013 8:44 AM
To: dev@spark.incubator.apache.org
Subject: Re: SPARK-942
I
I've posted a patch that I think produces the correct behavior at
https://github.com/kellrott/incubator-spark/commit/efe1102c8a7436b2fe112d3bece9f35fedea0dc8
It works fine on my programs, but if I run the unit tests, I get errors
like:
[info] - large number of iterations *** FAILED ***
[info] o
On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman <
stephen.haber...@gmail.com> wrote:
> Huge disclaimer that this is probably a big pita to implement, and
> could likely not be as worthwhile as I naively think it would be.
>
My perspective on this is it's already big pita of Spark users today.
> The problem is that the iterator interface only defines 'hasNext' and
> 'next' methods.
Just a comment from the peanut gallery, but FWIW it seems like being
able to ask "how much data is here" would be a useful thing for Spark
to know, even if that means moving away from Iterator itself, or
som
to make code complicated for only 0.1%
> possibility before we get perfect solution.
>
> -Original Message-
> From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu]
> Sent: Tuesday, November 12, 2013 6:28 AM
> To: dev@spark.incubator.apache.org
> Subject: Re: SPARK-942
>
> The
sage-
From: Kyle Ellrott [mailto:kellr...@soe.ucsc.edu]
Sent: Tuesday, November 12, 2013 6:28 AM
To: dev@spark.incubator.apache.org
Subject: Re: SPARK-942
The problem is that the iterator interface only defines 'hasNext' and 'next'
methods. So I don't think that there
before it become Arraybuffer.
>
> Is there any solution for this case? Could we just unroll first 10% of
> total iterator into ArrayBuffer, and estimate this size, and total size is
> equal to 10* size of 10%? apparently it is not perfect.
>
> -Original Message-
> From: Kyle
Ellrott [mailto:kellr...@soe.ucsc.edu]
Sent: Thursday, November 07, 2013 2:59 AM
To: dev@spark.incubator.apache.org
Subject: Re: SPARK-942
I think the usage has to be calculated as the iterator is being put into the
arraybuffer.
Right now, the BlockManager, in it's put method when it ge
t >wrote:
>
> > I was wondering if anybody had any thoughts on the best way to tackle
> > SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ).
> > Basically, Spark takes an iterator from a flatmap call and because I tell
> > it that it needs to persist
you have any ideas on this one, Kyle?
On Sat, Oct 26, 2013 at 10:53 AM, Kyle Ellrott wrote:
> I was wondering if anybody had any thoughts on the best way to tackle
> SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ).
> Basically, Spark takes an iterator from a flatmap
I was wondering if anybody had any thoughts on the best way to tackle
SPARK-942 ( https://spark-project.atlassian.net/browse/SPARK-942 ).
Basically, Spark takes an iterator from a flatmap call and because I tell
it that it needs to persist Spark proceeds to push it all into an array
before
63 matches
Mail list logo