Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1740#discussion_r15733162
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
---
@@ -60,4 +62,31 @@ class Strategy (
val
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733197
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733199
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733196
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733200
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733198
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733207
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733203
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733205
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733206
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733204
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733202
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733213
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733224
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-50983381
... I have no idea. Let me check.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-50983421
@pwendell I didn't see `Closes #1379` in the merged commit. Is something
wrong with asfgit?
---
If your project is set up for it, you can reply to this email and have
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733253
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15733270
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1740#issuecomment-50997066
LGTM. Merged into both master and branch-1.1. Thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735816
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735818
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735820
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735823
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735830
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15735833
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,414 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15738811
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15738819
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15738915
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15738983
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15738994
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739009
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739002
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739059
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739146
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739140
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739176
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739196
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739187
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739219
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15739777
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -0,0 +1,426 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739933
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739940
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739936
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739934
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739945
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739935
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739963
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/NormalizerSuite.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739972
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala
---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739969
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala
---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739974
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala
---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1207#discussion_r15739979
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala
---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1207#issuecomment-51017092
LGTM. Merged into both master and branch-1.1. Thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1719#discussion_r15740911
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1719#issuecomment-51021254
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1719#issuecomment-51023600
LGTM. Merged into both master and branch-1.1. @Ishiihara Thanks a lot for
implementing word2vec! Please help improve its performance during the QA
period. One task left
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1484#issuecomment-51024568
Sure. We had some transformers implemented under `mllib.feature`, similar
to sk-learn's approach. For feature selection, we can follow the same approach
if we view
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1484#issuecomment-51086836
@avulanov I have the same concern about calling `transform` before `fit`.
There are two options: 1) throw an error, 2) fit on the same dataset and then
transform
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/1790
[SPARK-2864][MLLIB] fix random seed in word2vec; move model to local
It also moves the model to local in order to map `RDD[String]` to
`RDD[Vector]`.
You can merge this pull request into a Git
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1790#discussion_r15830029
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -246,22 +246,24 @@ class Word2Vec(
}
val
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51246319
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51253334
Jenkins, where are you?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51253367
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51255025
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51267394
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1790#issuecomment-51274276
Merged into both master and branch-1.1.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1775#discussion_r15848895
--- Diff: python/pyspark/mllib/classification.py ---
@@ -73,11 +73,36 @@ def predict(self, x):
class LogisticRegressionWithSGD(object
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1775#issuecomment-51274747
LGTM. Merged into both master and branch-1.1. Thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1733#issuecomment-51287008
remove space between `@` and `jkbradley` ~ :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15853908
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15853902
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Algo.scala ---
@@ -27,4 +27,10 @@ import org.apache.spark.annotation.Experimental
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15853946
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15853962
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854011
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854029
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854074
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854094
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854129
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854139
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1334,10 +1519,10 @@ object DecisionTree extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854160
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15854238
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854415
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -89,4 +90,76 @@ object Statistics {
*/
@Experimental
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854417
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -89,4 +90,76 @@ object Statistics {
*/
@Experimental
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854426
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -89,4 +90,76 @@ object Statistics {
*/
@Experimental
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854484
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854487
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854488
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15854511
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1733#issuecomment-51290954
I think we should either allow user to input the raw observations or use
`Map[_, Long]` for input frequencies. I'm going to take a look at R's
implementation
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1733#issuecomment-51298178
@dorx I checked R's implementation and finally figured out what is going on.
1. When only a vector `x` is given, it is treated as a vector containing
frequency
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1733#discussion_r15857945
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1796#issuecomment-51298668
LGTM. Merged into both master and branch-1.1. Thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1778#issuecomment-51301688
@rezazadeh Do you mind creating a JIRA for this and then add `[SPARK-]`
to the title? We also want to learn more about the theory, especially the
relation between
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1733#issuecomment-51347255
The previous proposal may be hard to implement in Python. Another solution
would be separate goodness-of-fit test from independence test, e.g.,
`chiSqGofTest
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1671#discussion_r15887125
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/1807
[SPARK-2828][MLLIB] API consistency for `mllib.feature`
1. added a Java-friendly fit method to Word2Vec with tests
2. change DeveloperApi to Experimental for Normalizer StandardScaler
3
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1671#issuecomment-51375372
Thanks for explaining different approaches. What should be the best for
`HashingTF`? Adding the counts or with random signs and bounded by 0? I know
with random signs, we
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15892856
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +293,198 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15892945
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurities.scala ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15898657
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +309,93 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1798#discussion_r15898661
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -300,6 +309,93 @@ object DecisionTree extends Serializable with Logging
701 - 800 of 8762 matches
Mail list logo