Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19072
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r135695930
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -85,6 +86,10 @@ abstract class Predictor[
M <: PredictionMo
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19072
[SPARK-17133][ML][FOLLOW-UP] Add convenient method `asBinary` for casting
to BinaryLogisticRegressionSummary
## What changes were proposed in this pull request?
add an "asB
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/19026
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
@felixcheung In Jenkins Log I only found Random forest and Decision Tree
failed, random forest failed more frequently. thanks!
---
If your project is set up for it, you can reply to this
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r135534873
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -85,6 +86,10 @@ abstract class Predictor[
M <: PredictionMo
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19065
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19065
[SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure
consistent output columns
## What changes were proposed in this pull request?
Add test for prediction using
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
ping @felixcheung We can make all R tests for trees deterministic (not only
random trees). Leave other problems to separate PR. It would be great to fix it
soon, Thanks!
---
If your project
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19029#discussion_r135186430
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -438,6 +438,10 @@ private[ml] object SummaryBuilderImpl extends Logging
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
@felixcheung This error occur in the OneHotEncoder inside the RFormula I
think. Only OneHotEncoder will print this error message after I search the
project...
---
If your project is set up
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19029
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance
generate negative result
## What changes were proposed in this pull request?
Because of numerical error
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19026
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/16774
@BryanCutler @MLnick I agree pick `HasParallel` into this PR because the
`trait` has very little code. Another feature is pending on this PR. So we hope
this get merged soon! cc @jkbradley
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19026
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd
contains zero (backport PR for 2.2)
## What changes were proposed in this pull request?
This is backport PR of
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@jkbradley OK. (Can this directly merged to 2.2 ?)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r134449164
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18924
Thanks! I will take a look later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17849
What do you think about this ? @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18992#discussion_r134109929
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -57,6 +61,11 @@ private[ml] class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133895488
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1574,18 +1588,17 @@ sealed trait
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133895254
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1357,23 +1361,23 @@ sealed trait
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
@felixcheung So it do not cause bugs in sparkR, we can leave it in a
separated PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133883361
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1324,90 +1350,136 @@ private[ml] class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@jkkbradley OK. So I can remove the test I added ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133636062
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -882,21 +882,28 @@ class LogisticRegression @Since
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@MLnick I debug the testcase your mentioned.
The reason is, the zero var cause the computation generate `Infinite` and
`NaN` so the result is unexpectable, in this case, it happened to
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@MLnick Yes it is always trained in scaled space. But the testcase you
mentioned do not take the "scale" step, so do not trigger the bug...
---
If your project is set up for it, you
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@MLnick That's because, this bug will be triggered only when we standardize
feature first then do training...
---
If your project is set up for it, you can reply to this email and have
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r133191237
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,593 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
cc @jkbradley Code updated, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r133121659
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,593 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18798
@yanboliang I will update ASAP, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r133119397
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,593 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18798
@viirya Sure! comment updated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18872#discussion_r133081682
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -109,14 +112,15 @@ class LibSVMRelationSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18872#discussion_r133082255
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -109,14 +112,15 @@ class LibSVMRelationSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18872#discussion_r133081995
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -109,14 +112,15 @@ class LibSVMRelationSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r133080543
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
---
@@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r133080201
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala ---
@@ -69,6 +69,20 @@ class HashingTFSuite extends SparkFunSuite with
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r132253487
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
---
@@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17894#discussion_r132249512
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1722,25 +1723,22 @@ private class
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/18896
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd
contains zero
## What changes were proposed in this pull request?
fix bug of MLOR do not work correctly
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/16774
@BryanCutler You are right. Once `Future` complete the model can be cleaned
by GC. So the memory cost of the code has been optimized already. I didn't look
at the code carefully a few day
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17894
I am also interested in implementation by level-3 BLAS. Can you post a
design doc first?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/1#discussion_r132072527
--- Diff: python/pyspark/ml/pipeline.py ---
@@ -242,3 +327,65 @@ def _to_java(self):
JavaParams._new_java_obj
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/1#discussion_r132070100
--- Diff: python/pyspark/ml/pipeline.py ---
@@ -204,13 +282,20 @@ def copy(self, extra=None):
@since("2.0.0")
def
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17894#discussion_r132068663
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1722,25 +1723,22 @@ private class
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17894#discussion_r132069046
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1722,25 +1723,22 @@ private class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
cc @jkbradley @MrBago thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17583#discussion_r132058898
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r132058692
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
---
@@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17583#discussion_r132052106
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17373#discussion_r132021227
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
---
@@ -107,9 +103,9 @@ class
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17373#discussion_r131992917
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
---
@@ -83,6 +83,36 @@ class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18798
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r131979173
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,593 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17373#discussion_r131824713
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
---
@@ -82,6 +83,23 @@ class
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17373#discussion_r131790673
--- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
@@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private(
private
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18742#discussion_r131772274
--- Diff: python/pyspark/ml/util.py ---
@@ -61,33 +66,86 @@ def _randomUID(cls):
@inherit_doc
-class MLWriter(object):
+class
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18742#discussion_r131771002
--- Diff: python/pyspark/ml/tests.py ---
@@ -1158,6 +1165,33 @@ def test_decisiontree_regressor(self):
except OSError
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r131766119
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r131766525
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r131767248
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r131763824
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r131762320
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r131764683
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r131735748
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -547,35 +533,11 @@ object SummaryBuilderImpl extends Logging
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18797
@srowen Great! thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18797
Thanks! Waiting AFT testcode author to figure out how to modify the
testcase.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17849
Thanks your work on this but I am curious what is the benefit of doing
this? In pyspark there is no param in Model itself currently, what is the
problem or bugs it can resolve after adding
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18797
@srowen Yeah, the third case is another problem (I think we can simply
change the iter num 7 to 6 in testcase)
I am curious about the first two cases, why trigger the require fail ? By
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18797
Strange thing, the code failed this `require` at
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/StrongWolfe.scala#L73
in the three case
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18746#discussion_r130940500
--- Diff: python/pyspark/ml/base.py ---
@@ -116,3 +121,44 @@ class Model(Transformer):
"""
__metacl
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18313
@jkbradley
I think the thing is simple.
When persist model list param is `false`, just keep the code logic the same
and **it won't increase the memory cost** (This is the default
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130747756
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130746993
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130746893
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18798
@thunterdb
1) The dataframe deserialize from binary data will add overhead, (maybe
there is compaction or not, it depends on the datatype, cc @liancheng ) about
1x performance in my test
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18742#discussion_r130745275
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +289,124 @@ def numFeatures(self):
Returns the number of features the model was trained
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18798
performance data attached. cc @thunterdb @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130683266
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130684686
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -0,0 +1,619 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130684135
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130680940
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130683584
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130682301
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18798#discussion_r130684437
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,633 @@
+/*
+ * Licensed to the Apache Software
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/18798
[SPARK-19634][ML] Multivariate summarizer - dataframes API
## What changes were proposed in this pull request?
This patch adds the DataFrames API to the multivariate summarizer (mean
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/18797
[SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong
wolfe line search
## What changes were proposed in this pull request?
Update breeze to 0.13.1 for an emergency
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18742#discussion_r130523538
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +289,124 @@ def numFeatures(self):
Returns the number of features the model was trained
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18742#discussion_r130519716
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -375,6 +375,18 @@ def copy(self, extra=None):
that._defaultParamMap
701 - 800 of 1170 matches
Mail list logo