[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-09-10 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-139421814 Now that #7403 is merged, I've rebased this PR on top of `master`. This PR is ready for review now, when someone has the time. Thanks. --- If your project is set u

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-10 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-139340910 @andrewor14 I agree the name `combineByKeyWithClassTag` is awkward. I'm open to changing it to a less awkward name, if you or anyone else can suggest one.

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-10 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-139301528 The class names are `private` now and all tests pass. To provide a little more explanation of why the `combineByKeyWithClasstag` methods are needed (instead of

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-139043827 Thanks for the response, @andrewor14. I'll update this PR to make the class names private. As you say, we can make them public in a future PR, if needed. You

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r39095321 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -70,12 +71,13 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) * In

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-138977028 I just rebased this on top of `upstream/master` in order to fix some conflicts that arose as master changed underneath this PR. @rxin @andrewor14 This PR has

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-08 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-138622760 @rxin and @andrewor14 - Is there anything more that needs to be done before this PR is ready to be merged? I've made all recommended changes. There is an

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-03 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38663395 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -76,6 +78,15 @@ class ShuffleDependency[K, V, C]( override def rdd: RDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-136879557 @andrewor14 I noticed that I missed a few of the style fixes that you recommended. I just pushed 41d2a3c which fixes them. Thanks for the reviewing this PR. I

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38474790 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala --- @@ -75,7 +76,8 @@ private[spark] class CoGroupPartition( * @param part

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38469436 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -565,12 +603,25 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38468044 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -565,12 +603,25 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38467056 --- Diff: core/src/test/scala/org/apache/spark/DependencySuite.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-09-01 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38463750 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -76,6 +78,15 @@ class ShuffleDependency[K, V, C]( override def rdd: RDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-135946761 @rxin All tests pass after the `combinerClassName` is changed to `Option[String]`. Let me know if there's anything else that needs to be changed to merge this. -

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-28 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r38234281 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -76,6 +78,18 @@ class ShuffleDependency[K, V, C]( override def rdd: RDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-04 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-127752526 @rxin I look forward to your comment. * This PR only adds `ClassTag`s to the following `@DeveloperApi` classes: `Dependency`, `CoGroupedRDD`, and `ShuffledRDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-04 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-127717361 Thanks, @JoshRosen. Appreciate the response. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-04 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-127679415 I should mention that this is the last change to Spark Core that I need to make for the Parquet shuffle to work. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-08-04 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-127678841 @rxin @JoshRosen A week has passed without comment. When you have a moment, it would be great to hear your feedback on this PR. --- If your project is set up for it

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125763381 @rxin @JoshRosen Sorry for all the Jenkins noise on this PR. Shane and I (mostly Shane!) tracked down the underlying issue on Jenkins that was causing the PRB to fail

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125724386 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125718360 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125713253 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-28 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125678927 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-27 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-125266698 Once #7403 is merged, I'll rebase this PR on and fix the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-27 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-125264591 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124795158 Still failing with... ``` [info] KinesisBackedBlockRDDSuite: [info] Exception encountered when attempting to run a suite with class name

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124781545 The error isn't related to this PR... ``` [info] *** 1 SUITE ABORTED *** [error] Error: Total 21, Failed 0, Errors 1, Passed 20 [error] Error d

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124781553 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r35471075 --- Diff: core/src/main/scala/org/apache/spark/rdd/SubtractedRDD.scala --- @@ -48,7 +48,7 @@ import org.apache.spark.serializer.Serializer * you can use

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124625710 Jenkins tests pass. The other failures were Jenkins hiccups. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124571785 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-124565062 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-23 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r35395219 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -103,13 +105,42 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-23 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r35392582 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -76,6 +78,18 @@ class ShuffleDependency[K, V, C]( override def rdd: RDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-22 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-123917486 This is ready for review when someone has the time. All unit tests pass. * Only classes marked `@DeveloperApi` or are `private[spark]` have been altered to

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-16 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-122134441 There is another approach that would work here too. Currently, `RDD` has following implicit method for the PairRDDFunctions... ```scala object RDD

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-15 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121776325 I just pushed an update (5c58b4df9b) which ensures that we keep binary compatibility in `PairRDDFunctions`. The old `combineByKey` methods now call into the new

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-15 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121740911 To answer my own question, adding a `ClassTag`, of course, breaks binary compatibility since it changes the method signature. Sorry about the unnecessary comment chatter

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-15 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121686068 @JoshRosen @rxin I just pushed an update that reverts the `PairRDDFunctions` `ClassTag` implicits to their original form. The only remaining sticking point is

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-15 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-121669031 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8875] Remove BlockStoreShuffleFetcher c...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7268#issuecomment-121422806 @kayousterhout Would you consider making `ShuffleBlockFetcherIterator` a public `@DeveloperApi` instead of `private[spark]`? That would allow ShuffleManager

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/7403#discussion_r34630191 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -65,7 +67,7 @@ abstract class NarrowDependency[T](_rdd: RDD[T]) extends Dependency[T

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121408542 The `PairRDDFunctions` classes isn't labeled `DeveloperApi` so what is the assumed level of stability? Luckily, the original PairRDDFunctions has ClassTag

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121406752 Also, the comment in the `DeveloperApi` annotation reads... ``` /** * A lower-level, unstable API intended for developers. * * Developer API&#

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121405109 @JoshRosen What suggestions do you have for making the key, value and combiner class names available in the ShuffleDependency in a way that is binary compatible

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121397982 There was a timeout fetching from the git repo. Having Jenkins try again. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7403#issuecomment-121397864 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-121392211 I've opened PR #7403 which includes only the changes to the Spark shuffle, serializing the class paths. I'll rebase this PR to only have the Parque

[GitHub] spark pull request: [SPARK-9043] Serialize key, value and combiner...

2015-07-14 Thread massie
GitHub user massie opened a pull request: https://github.com/apache/spark/pull/7403 [SPARK-9043] Serialize key, value and combiner classes in ShuffleDependency ShuffleManager implementations are currently not given type information for the key, value and combiner classes

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-120095734 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-120095678 Looks like Jenkins was in a bad state. I'll kick it to test again. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request: [SPARK-8875] Remove BlockStoreShuffleFetcher c...

2015-07-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7268#issuecomment-120094947 @kayousterhout I like that this PR deletes more lines than it adds; the Spark shuffle code needs that. I just looked over the code and it shouldn't derail my work i

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-08 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119776748 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-08 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119770304 I just rebased on master to fix a pom.xml conflict. Both `parquet-avro` and `parquet-thrift` were added in #7231, so they don't need to be added here. --- If

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-08 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119745577 I'm not sure why Jenkins is calling out changes to ``` class DecisionTreeClassificationModel(DecisionTreeModel): class RandomForestClassification

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119377204 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119376439 Jenkins has just been reconfigured to fix a testing infra bug. I'm going to kill the current build and start a new one. Jenkins, test this please. -

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119358731 Done. The Parquet shuffle reader behaves identically to the hash shuffle reader now and uses the defined Spark Serializer. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119349112 Map-side aggregation is supported in this pull request. If you look at the `ParquetShuffleManager.parquetShuffleCanBeUsed()` method, you'll see that it generates an

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119330028 This PR currently only has four new tests to show that: * the fallback shuffle for non-Avro objects works * the Parquet shuffle is able to shuffle *with and

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread massie
GitHub user massie opened a pull request: https://github.com/apache/spark/pull/7265 [SPARK-7263] Add new shuffle manager which stores shuffle blocks in Parquet This commit adds a new Spark shuffle manager which reads and writes shuffle data to Apache Parquet files. Parquet has a

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-115071736 @kayousterhout I'm happy to squash the commits and rebase them on `master`, if that helps. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-24 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-11503 @kayousterhout Is this still on track for merging today? Let me know if you see anything else that needs to be done. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-23 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-114747817 Thanks, @kayousterhout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-23 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-114676939 No worries. I'll fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-23 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-114563503 @kayousterhout @rxin All the tests passed. Let me know if you'd like any more changes made and I'll get to them immediately. --- If your project is set up f

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-22 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-114295684 @kayousterhout @rxin I just pushed an update which adds a test specifically to validate that the `HashShuffleReader.read()` method is properly reading values, saving

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-19 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-113553309 I make the updates first thing next week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-18 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-113300714 @kayousterhout, Thanks for reviewing this PR. I agree that we should be more defensive moving forward. I think the right next step (no pun intended) is to [Update

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-18 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32774776 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -290,22 +287,15 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-18 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32774012 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -290,22 +287,15 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-18 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32758933 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -290,22 +287,15 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-18 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32746147 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -290,22 +287,15 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-14 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-111880817 @rxin I'm looking forward to hearing your thoughts about this change. Let me know if you have any questions or suggestions. --- If your project is set up for it

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-12 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-111593360 @kayousterhout I just pushed an update which checks that `BufferReleasingInputStream` is not just releasing memory but also closing the underlying delegate `InputStream

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-12 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-111515701 @kayousterhout I'll work on improving the unit tests today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-11 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32278636 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,23 +34,55 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-11 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-111240310 WooHoo! All the tests passed. Thanks for all the help on this @kayousterhout, @squito, @JoshRosen and @sryza. Let me know if you see anything you'd like ch

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-11 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32242149 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,23 +34,55 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-110948681 I manually killed the old Jenkins job (34640) since it's was testing outdated code. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32177982 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -57,7 +58,16 @@ class ShuffleBlockFetcherIteratorSuite

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32177783 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -104,10 +113,13 @@ class

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32177476 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -17,21 +17,22 @@ package

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32176669 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,17 +34,52 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32174449 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,17 +34,52 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-10 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32173458 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -17,23 +17,21 @@ package

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-110538136 @kayousterhout I update the code to update shuffle read metrics in the shuffle reader and reorganized the imports as you requested. Let me know if I missed anything. I

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32076483 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,17 +34,54 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32075027 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -78,21 +77,13 @@ private[hash] object

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32074744 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -33,17 +34,54 @@ private[spark] class HashShuffleReader[K, C

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32074476 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -313,6 +314,35 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32074217 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -313,6 +314,35 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32074154 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -17,23 +17,23 @@ package org.apache.spark.storage

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32073465 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -17,10 +17,11 @@ package org.apache.spark.shuffle.hash

[GitHub] spark pull request: [SPARK-7884] Move block deserialization from B...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32072166 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -17,23 +17,22 @@ package

[GitHub] spark pull request: [SPARK-7884] Allow Spark shuffle APIs to be mo...

2015-06-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-110490995 @JoshRosen Makes sense: I'll update the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-7884] Allow Spark shuffle APIs to be mo...

2015-06-09 Thread massie
Github user massie commented on the pull request: https://github.com/apache/spark/pull/6423#issuecomment-110490313 @squito Just pushed an update which includes all your suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-7884] Allow Spark shuffle APIs to be mo...

2015-06-09 Thread massie
Github user massie commented on a diff in the pull request: https://github.com/apache/spark/pull/6423#discussion_r32057774 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -97,14 +96,12 @@ final class ShuffleBlockFetcherIterator

  1   2   >