[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57358549 @jkbradley @manishamde --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3701][MLLIB] update python linalg api a...

2014-09-30 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2548#discussion_r18245962 --- Diff: python/pyspark/mllib/linalg.py --- @@ -222,20 +283,33 @@ def dot(self, other): 0.0 a.dot(np.array([[1, 1], [2, 2], [3, 3

[GitHub] spark pull request: [SPARK-3701][MLLIB] update python linalg api a...

2014-09-30 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2548#discussion_r18245965 --- Diff: python/pyspark/mllib/linalg.py --- @@ -439,10 +531,11 @@ def toArray(self): arr = array.array('d', [float(i) for i in range(4

[GitHub] spark pull request: [SPARK-3701][MLLIB] update python linalg api a...

2014-09-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2548#issuecomment-57393902 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3701][MLLIB] update python linalg api a...

2014-09-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2548#issuecomment-57402247 Merged into master. Thanks @jkbradley for review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57415215 @chouqin The performance gain is already significant. The aggregation time reduced to 3s from ~30s in my experiment. I just want to see whether we can optimize

[GitHub] spark pull request: [SPARK-3751] [mllib] DecisionTree: example upd...

2014-10-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2604#issuecomment-57431357 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57434938 @chouqin The trained model only contains a single node in the python test. Maybe there is a bug that caused early termination. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18297053 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -159,161 +166,15 @@ private[tree] abstract class

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57579497 @buenrostro-oo @tdas We have seen several test failures from `NetworkReceiverSuite`. Do you have time to take a look? Thanks! --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57778527 LGTM. Merged into master. Thanks @chouqin , and @jkbradley and @manishamde for code review! Increasing `maxMemoryInMB` also increases the shuffle size. As long

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57785126 @mdagost If you convert `(Int, Array[Double])` to a `java.util.ListObject` (id the first and features the second (without converting to string)), you should be able

[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57839878 @staple Sorry for late response and thank you for working on this JIRA! For the best practice, before you start working on a JIRA, please first ask on the JIRA page

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-57841035 @derrickburns The `*ClusterSuite` was created to prevent referencing unnecessary objects into the task closure. You can try to remove `Serializable` from algorithms

[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57865710 @staple The conditional distribution matrix may not be sparse. That is why we use dense format to store it. Maybe we can do a hard thresholding to make it parse

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-57865917 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-57865927 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423427 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -375,7 +376,9 @@ abstract class RDD[T: ClassTag]( val sum = weights.sum

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423440 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423426 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -43,7 +43,8 @@ import org.apache.spark.partial.PartialResult import

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423449 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423438 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423429 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423444 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423475 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423477 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423453 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423474 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423459 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423463 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423478 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423454 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423479 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423461 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423433 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423437 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423464 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423484 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423470 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423457 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423473 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423487 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423485 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423495 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423492 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423489 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423468 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423448 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423443 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423499 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423504 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423498 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423491 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423493 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18423500 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-57874204 @erikerlandson I didn't check the test code. I will try to find another time to make a pass on the test. The implementation looks good to me except minor inline comments

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-58091578 @rezazadeh Could you update the example using `scopt` to parse parameters? You can check other example code for its usage. We try to be consistent across example code

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,58 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486384 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486359 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# --- End diff -- Please rename the file to `feature.py` to make `Word2Vec

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486387 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486381 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486395 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486413 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486524 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486706 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,58 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18486738 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,58 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58104576 @derrickburns The style test doesn't capture all, unfortunately. The Spark Code Style Guide is the first place to check. I will mark a few examples inline. I

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488327 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansParallel.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488318 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -17,429 +17,57 @@ package org.apache.spark.mllib.clustering

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488296 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GeneralizedKMeansModel.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488309 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -17,429 +17,57 @@ package org.apache.spark.mllib.clustering

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488331 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansPlusPlus.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488320 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -17,429 +17,57 @@ package org.apache.spark.mllib.clustering

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488293 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GeneralizedKMeansModel.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488325 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala --- @@ -25,37 +25,28 @@ import org.apache.spark.mllib.linalg.Vector

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488297 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GeneralizedKMeansModel.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488364 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -255,4 +253,4 @@ class KMeansClusterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488334 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansPlusPlus.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488302 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -17,429 +17,57 @@ package org.apache.spark.mllib.clustering

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488348 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeansClusterer.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488358 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/FastEuclideanOps.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488316 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -17,429 +17,57 @@ package org.apache.spark.mllib.clustering

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488342 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/MultiKMeans.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488359 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/package.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2634#discussion_r18488352 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/metrics/EuclideanOps.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58106056 @derrickburns I marked a few style problems (not all of them). There are breaking changes in your PR, which we should avoid as much as possible. Even we want to remove

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-10-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2356#issuecomment-58118924 @Ishiihara Another file to update is `python/docs/pyspark.mllib.rst`. We need a section for `pyspark.mllib.feature` module. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-58122649 @derrickburns I don't know any formatter that can do the job nicely. This has to be done by hand at this moment, unfortunately. `KMeans` has a public constructor

[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58221886 @dbtsai Could you check whether there is any dependency change in breeze-0.10 and the number of files in breeze-0.10 jar? Does it compatible with both Scala 2.10 and 2.11

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536661 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536639 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -39,13 +42,46 @@ trait RandomSampler[T, U] extends Pseudorandom

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536657 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536653 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536673 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536693 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536679 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536671 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536659 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r18536697 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -18,96 +18,547 @@ package org.apache.spark.util.random

<    1   2   3   4   5   6   7   8   9   10   >