[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/149 SPARK-1255: Allow user to pass Serializer object instead of class name for shuffle. This is more general than simply passing a string name and leaves more room for performance optimizations

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37720234 @marmbrus this is for you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10636356 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10636602 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd

[GitHub] spark pull request: Fix serialization of MutablePair. Also provide...

2014-03-14 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/141#issuecomment-37717648 I did : https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=e19044cb1048c3755d1ea2cb43879d2225d49b54 --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-1237, 1238] Improve the computation of ...

2014-03-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/131#issuecomment-37507055 Thanks. I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/135#discussion_r10579726 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -310,6 +310,9 @@ abstract class RDD[T: ClassTag]( * Return a sampled subset

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-37607046 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1236 - Upgrade Jetty to 9.1.3.v20140225.

2014-03-12 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37482277 Ok I pushed a new version with Maven build changes as well. This is ready to be merged from my perspective. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-1236 - Upgrade Jetty to 9.1.3.v20140225.

2014-03-12 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37483693 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552638 --- Diff: core/src/main/scala/org/apache/spark/util/TimeStampedWeakValueHashMap.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552654 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552655 --- Diff: core/src/test/scala/org/apache/spark/CheckpointSuite.scala --- @@ -432,7 +432,7 @@ object CheckpointSuite { // This is a custom cogroup function

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552666 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkALS.scala --- @@ -54,7 +54,7 @@ object SparkALS { for (i - 0 until M; j - 0 until U

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552662 --- Diff: examples/src/main/scala/org/apache/spark/examples/LocalALS.scala --- @@ -53,7 +53,7 @@ object LocalALS { for (i - 0 until M; j - 0 until U

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552671 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkHdfsLR.scala --- @@ -34,8 +34,8 @@ object SparkHdfsLR { case class DataPoint(x: Vector, y

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552685 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/NetworkInputDStream.scala --- @@ -128,7 +128,7 @@ abstract class NetworkReceiver[T

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552681 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/impl/Serializers.scala --- @@ -298,7 +298,7 @@ abstract class ShuffleSerializationStream(s: OutputStream

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552680 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/impl/Serializers.scala --- @@ -391,7 +391,7 @@ abstract class ShuffleDeserializationStream(s

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552712 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -137,7 +137,7 @@ trait ClientBase extends Logging { } else

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552705 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -152,7 +152,7 @@ class InputStreamsSuite extends TestSuiteBase

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552723 --- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala --- @@ -206,12 +206,12 @@ class SendingConnection(val address: InetSocketAddress

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552716 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -278,7 +278,7 @@ private[spark] class Executor( // have left some

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552834 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552839 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552846 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -20,15 +20,15 @@ package org.apache.spark import java.io._ import

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552859 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552941 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552981 --- Diff: core/src/main/scala/org/apache/spark/util/BoundedHashMap.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10553024 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37501863 If you don't need high performance, why not just put a normal immutable hashmap so you don't have to worry about concurrency? --- If your project is set up for it, you can

[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-11 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37377281 That sounds good. @pwendell should make the call here ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Upgrade Jetty to 9.1.3.v20140225.

2014-03-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/113#discussion_r10420573 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -120,26 +120,25 @@ private[spark] object JettyUtils extends Logging

[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-10 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37213133 The main reason is that we are investigating upgrading some of the major dependencies prior to 1.0, after which we won't be able to upgrade for a while. Some users have

[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-10 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37230554 Yea unfortunately that's been the case for the past few days We should probably have another external repo to host artifacts only on cloudera repo to have some

[GitHub] spark pull request: Upgrade Jetty to 9.1.3.v20140225.

2014-03-09 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/113 Upgrade Jetty to 9.1.3.v20140225. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark jetty9 Alternatively you can review and apply

[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/104 Update junitxml plugin to the latest version to avoid recompilation in every SBT command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin

[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37109170 Ok I merged this. Not sure about Maven off the top of my head. All these build plugins are pretty arcane to me. --- If your project is set up for it, you can

[GitHub] spark pull request: Allow sbt to use more than 1G of heap.

2014-03-07 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/103 Allow sbt to use more than 1G of heap. There was a mistake in sbt build file ( introduced by 012bd5fbc97dc40bb61e0e2b9cc97ed0083f37f6 ) in which we set the default to 2048 and the immediately reset

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36949572 Actually - no ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36949585 We should use the primitive hashmap - otherwise it is pretty slow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1164 Deprecated reduceByKeyToDriver as i...

2014-03-04 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/72#issuecomment-36656843 Thanks. Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1178: missing document of spark.schedule...

2014-03-04 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/74#issuecomment-36656962 Thanks. Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Removed accidentally checked in comment

2014-03-03 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/61#issuecomment-36569896 I merged this. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Remove remaining references to incubation

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/51#issuecomment-36448947 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Update io.netty from 4.0.13 Final to 4.0.17.Fi...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/41#issuecomment-36449234 Sorry @ngbinh you misunderstood me. I think the problem is the git commit metadata doesn't actually contain the author information. It could be that the email or the author

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/55 SPARK-1158: Fix flaky RateLimitedOutputStreamSuite. There was actually a problem with the RateLimitedOutputStream implementation where the first second doesn't write anything because of integer

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36473419 @tdas @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/64#issuecomment-36486791 Thanks Aaron. I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/63#issuecomment-36486838 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/64#issuecomment-36487507 Actually you will need to submit another PR. I've already merged this one (but github is laggy because it is waiting for the asf git bot to synchronize). Sorry about

[GitHub] spark pull request: SPARK-1173. (#2) Fix typo in Java streaming ex...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/65#issuecomment-36487798 I merged this one too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Initialized the regVal for first iteration in ...

2014-02-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/40#discussion_r10188555 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -149,7 +149,14 @@ object GradientDescent extends Logging

[GitHub] spark pull request: Initialized the regVal for first iteration in ...

2014-02-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/40#issuecomment-36413420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Initialized the regVal for first iteration in ...

2014-02-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/40#issuecomment-36413838 It is running now. Let's wait for Jenkins to come back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well