[GitHub] spark pull request: [SPARK-2197] [mllib] Java DecisionTree bug fix...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1740#discussion_r15733162 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -60,4 +62,31 @@ class Strategy ( val

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733199 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733196 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733200 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733198 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733207 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733203 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733205 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733206 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733202 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733213 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733224 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50983381 ... I have no idea. Let me check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50983421 @pwendell I didn't see `Closes #1379` in the merged commit. Is something wrong with asfgit? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733253 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733270 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2197] [mllib] Java DecisionTree bug fix...

2014-08-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1740#issuecomment-50997066 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735816 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735818 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735820 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735823 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735830 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15735833 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,414 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738811 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738819 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738915 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738983 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738994 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739009 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739002 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739059 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739146 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739140 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739176 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739196 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739187 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739219 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15739777 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,426 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739933 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739940 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739936 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739934 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739945 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739935 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739963 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739972 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739969 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739974 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15739979 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1207#issuecomment-51017092 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15740911 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-04 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51021254 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-04 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51023600 LGTM. Merged into both master and branch-1.1. @Ishiihara Thanks a lot for implementing word2vec! Please help improve its performance during the QA period. One task left

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2014-08-04 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51024568 Sure. We had some transformers implemented under `mllib.feature`, similar to sk-learn's approach. For feature selection, we can follow the same approach if we view

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2014-08-04 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51086836 @avulanov I have the same concern about calling `transform` before `fit`. There are two options: 1) throw an error, 2) fit on the same dataset and then transform

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1790 [SPARK-2864][MLLIB] fix random seed in word2vec; move model to local It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1790#discussion_r15830029 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -246,22 +246,24 @@ class Word2Vec( } val

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51246319 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51253334 Jenkins, where are you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51253367 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51255025 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51267394 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2864][MLLIB] fix random seed in word2ve...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1790#issuecomment-51274276 Merged into both master and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1775#discussion_r15848895 --- Diff: python/pyspark/mllib/classification.py --- @@ -73,11 +73,36 @@ def predict(self, x): class LogisticRegressionWithSGD(object

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1775#issuecomment-51274747 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51287008 remove space between `@` and `jkbradley` ~ :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15853908 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15853902 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Algo.scala --- @@ -27,4 +27,10 @@ import org.apache.spark.annotation.Experimental

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15853946 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15853962 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854011 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854029 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854074 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854094 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854129 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854139 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854160 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15854238 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854415 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -89,4 +90,76 @@ object Statistics { */ @Experimental

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854417 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -89,4 +90,76 @@ object Statistics { */ @Experimental

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854426 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -89,4 +90,76 @@ object Statistics { */ @Experimental

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854484 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854487 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854488 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15854511 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51290954 I think we should either allow user to input the raw observations or use `Map[_, Long]` for input frequencies. I'm going to take a look at R's implementation

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51298178 @dorx I checked R's implementation and finally figured out what is going on. 1. When only a vector `x` is given, it is treated as a vector containing frequency

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1733#discussion_r15857945 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1796#issuecomment-51298668 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [MLlib] DIMSUM: Dimension Independent Matrix S...

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51301688 @rezazadeh Do you mind creating a JIRA for this and then add `[SPARK-]` to the title? We also want to learn more about the theory, especially the relation between

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-51347255 The previous proposal may be hard to implement in Python. Another solution would be separate goodness-of-fit test from independence test, e.g., `chiSqGofTest

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1671#discussion_r15887125 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-2828][MLLIB] API consistency for `mllib...

2014-08-06 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1807 [SPARK-2828][MLLIB] API consistency for `mllib.feature` 1. added a Java-friendly fit method to Word2Vec with tests 2. change DeveloperApi to Experimental for Normalizer StandardScaler 3

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-08-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1671#issuecomment-51375372 Thanks for explaining different approaches. What should be the best for `HashingTF`? Adding the counts or with random signs and bounded by 0? I know with random signs, we

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15892856 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15892945 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15898657 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +309,93 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2851] [mllib] DecisionTree Python consi...

2014-08-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1798#discussion_r15898661 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -300,6 +309,93 @@ object DecisionTree extends Serializable with Logging

<    3   4   5   6   7   8   9   10   11   12   >