[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691539 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691536 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13691546 --- Diff: python/pyspark/rdd.py --- @@ -400,6 +399,18 @@ def takeSample(self, withReplacement, num, seed=None): sampler.shuffle(samples

[GitHub] spark pull request: [SPARK-2088] fix NPE in toString

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1028#issuecomment-45939821 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2085: [MLlib] Apply user-specific regula...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1026#issuecomment-45954737 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2085: [MLlib] Apply user-specific regula...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1026#issuecomment-45954727 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2085: [MLlib] Apply user-specific regula...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1026#issuecomment-45954951 @coderxiang The merge was not clean. It contains changes from my PR. Could you re-merge the latest master and check the diff is correct on this page? --- If your project

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/916#discussion_r13732786 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -22,6 +22,9 @@ import scala.reflect.ClassTag import org.scalatest.FunSuite

[GitHub] spark pull request: SPARK-2085: [MLlib] Apply user-specific regula...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1026#issuecomment-45964506 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/490#issuecomment-45965012 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/490#issuecomment-45965023 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45965544 LGTM. Thanks! Waiting for Jenkins ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45970681 Merged. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/490#issuecomment-45976550 @codeboyyong I've merged this. Could you please make a patch for branch-0.9? Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742986 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742980 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742995 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13743003 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742972 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742973 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13743001 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742975 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742981 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742970 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742971 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742989 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742968 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13742976 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-45990973 @vrilleup This implementation looks good to me and thanks for the experiments! Besides the inline comments, we should think when to switch from ARPACK to dense SVD. ARPACK

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

2014-06-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/916#issuecomment-45992111 @colorant Tried the following with the new implementation: ~~~ val rdd = sc.parallelize(0 until 10, 1).flatMap(i => Iterator.fill(10)(0)) // 10

[GitHub] spark pull request: [HOTFIX] add math3 version to pom

2014-06-13 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1075 [HOTFIX] add math3 version to pom Passed `mvn package`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark takeSample-fix

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1099#discussion_r13843020 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -73,10 +73,18 @@ private[spark] class

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1099#discussion_r13843025 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -130,7 +129,8 @@ class Client(args: ClientArguments, conf

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1099#discussion_r13843041 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -160,15 +160,19 @@ class Client(args: ClientArguments, conf

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1099#discussion_r13843048 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -160,15 +160,19 @@ class Client(args: ClientArguments, conf

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1099#discussion_r13843046 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -160,15 +160,19 @@ class Client(args: ClientArguments, conf

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-46267457 @codeboyyong Thanks for submitting the patch! It looks good to me except a few style issues. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: MLlib documentation fix

2014-06-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1098#issuecomment-46270697 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: HOTFIX: bug caused by #941

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1108#issuecomment-46372243 Verified that it is working now. I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1110 [WIP][SPARK-2174][MLLIB] treeReduce and treeAggregate In `reduce` and `aggregate`, the driver node spends linear time on the number of partitions. It becomes a bottleneck when there are many

[GitHub] spark pull request: [WIP][SPARK-1485][MLLIB] Implement Butterfly A...

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/506#issuecomment-46380865 Thanks all for reviewing this PR! I found the butterfly pattern introduces complex dependency that slows down the computation. In my tests, a good approach for Spark is

[GitHub] spark pull request: [WIP][SPARK-1485][MLLIB] Implement Butterfly A...

2014-06-17 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/506 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899771 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899776 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899792 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899798 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r1396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46398707 @vrilleup Thanks for updating the PR! I made a comment on the explicit type checks. I'm a little confused about the new API. If `isDenseSVD` is true, `tol` doesn&#

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13900396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46398850 Btw, we shouldn't use default parameters in method definition. It is convenient in Scala but it is not Java friendly. Also, this is hard for us to maintain b

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46411430 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13905413 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13905479 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13905461 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46411779 The change looks good to me. Let us wait for Jenkins and MIMA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46503707 Jenkins, add to white list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1112] use min akka frame size to decide...

2014-06-18 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1124 [SPARK-1112] use min akka frame size to decide how to send task results Task results are sent either via akka directly or block manager indirectly, based on whether the size of the serialized task

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46517676 @ash211 If there is a way to make the configuration delivered consistently to backend, we can use `spark.akka.frameSize` consistently. It is then not necessary to set the

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1124#discussion_r13951639 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -212,7 +208,14 @@ private[spark] class Executor( val

[GitHub] spark pull request: Squishing a typo bug before it causes real har...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1125#issuecomment-46524600 Thanks! Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46527308 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46528231 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1124#discussion_r13955220 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -212,7 +208,12 @@ private[spark] class Executor( val

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1124#discussion_r13955317 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -212,7 +208,12 @@ private[spark] class Executor( val

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-46529144 @nevillelyh Is there a JIRA for it? Is it fixed in 0.8.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46530221 For the first scenario, it won't make the performance worse because the system doesn't really work now for serialized task result of size betwee

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46531328 I don't get it. As long as the actor systems are created from AkkaUtils.createActorSystem, the minimum value of the max frame size is 10M. All unit tests pass

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46531776 Just tested `- 1024` in `SchedulerBackend`. The system did hang up when the task size is close to 10M - 1024 ... ~~~ scala> val random = new java.util.Ran

[GitHub] spark pull request: [SPARK-1112, 2156] Bootstrap to fetch the driv...

2014-06-19 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1132 [SPARK-1112, 2156] Bootstrap to fetch the driver's Spark properties. This is an alternative solution to #1124 . Before launching the executor backend, we first fetch driver's spark prop

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46541699 @pwendell @kayousterhout I put an alternative solution in #1132 . Please let me know which do you prefer. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-1112, 2156] Bootstrap to fetch the driv...

2014-06-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1132#discussion_r13982529 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -101,26 +106,33 @@ private[spark] object

[GitHub] spark pull request: [SPARK-1112, 2156] Bootstrap to fetch the driv...

2014-06-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1132#discussion_r13984220 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -101,26 +106,33 @@ private[spark] object

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46643277 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46694507 Merged. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...

2014-06-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46769968 Jenkins, retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...

2014-06-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46769990 LGTM. I will merge it to branch-1.0 if Jenkins is happy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-21 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/1124 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1112, 2156] use min akka frame size to ...

2014-06-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1124#issuecomment-46769996 Closing this in favor of #1132. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...

2014-06-22 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775109 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-46814485 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-46924196 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...

2014-06-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-47134903 This looks good to me. I'm going merge it since pyspark is broken without this patch. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2172] PySpark cannot import mllib modul...

2014-06-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1223#discussion_r14223221 --- Diff: mllib/pom.xml --- @@ -76,5 +76,16 @@ scalatest-maven-plugin + + +src/main/resources

[GitHub] spark pull request: SPARK-2281 [MLlib] Simplify the duplicate code...

2014-06-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1215#discussion_r14223272 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -37,7 +37,11 @@ abstract class Gradient extends Serializable

[GitHub] spark pull request: [SPARK-2172] PySpark cannot import mllib modul...

2014-06-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1223#issuecomment-47187655 LGTM and tested with `mvn install`. Thanks for fixing it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-47190528 No, just want to see Jenkins happy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-47190546 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2251] fix concurrency issues in random ...

2014-06-26 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1229 [SPARK-2251] fix concurrency issues in random sampler The following code is very likely to throw an exception: ~~~ val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1

[GitHub] spark pull request: fix concurrency issues in random sampler

2014-06-26 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1234 fix concurrency issues in random sampler The following code is very likely to throw an exception: ~~~ val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1) rdd.zip(rdd).count

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-47291043 I think `PYSPARK_PYTHON` is set to `/usr/local/bin/python2.7` in Jenkins but it doesn't exist. @pwendell ? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-2251] fix concurrency issues in random ...

2014-06-26 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/1234 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2293. Replace RDD.zip usage by map with ...

2014-06-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1250#issuecomment-47598550 @srowen Thanks for fixing it! LGTM. Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-07-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-47681084 @vrilleup Just checked Matlab’s svd and svds. I don’t remember I have used options.{tol, maxit} before. I wonder whether this is useful to expose to users. I did use

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-07-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47686100 @dbtsai Thanks for testing it! I'm going to move `treeReduce` and `treeAggregate` to `mllib.rdd.RDDFunctions`. For normal data processing, people generally use

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-47813702 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...

2014-07-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-47813772 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14500775 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -27,8 +27,12 @@ import scala.collection.Map import

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14500845 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -46,6 +48,8 @@ import org.apache.spark.Partitioner.defaultPartitioner

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-07-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-47878720 @yangliuyu What did you set for `k` and how many iterations it took? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-07-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-47881287 @vrilleup Both approaches compute the truncated SVD. I still prefer putting both implementation under `computeSVD` for now. I'm going to implement a generic Paramet

<    7   8   9   10   11   12   13   14   15   16   >