GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/14157
[SPARK-16500][ML][MLLib][Optimizer] add LBFGS convergence warning for all
used place in MLLib
## What changes were proposed in this pull request?
Add warning_for the following case
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133636062
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -882,21 +882,28 @@ class LogisticRegression @Since
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@jkkbradley OK. So I can remove the test I added ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133883361
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1324,90 +1350,136 @@ private[ml] class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
@felixcheung So it do not cause bugs in sparkR, we can leave it in a
separated PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133895254
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1357,23 +1361,23 @@ sealed trait
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r133895488
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1574,18 +1588,17 @@ sealed trait
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18992#discussion_r134109929
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -57,6 +61,11 @@ private[ml] class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17849
What do you think about this ? @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18924
Thanks! I will take a look later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r134449164
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18896
@jkbradley OK. (Can this directly merged to 2.2 ?)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19026
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd
contains zero (backport PR for 2.2)
## What changes were proposed in this pull request?
This is backport PR of
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/16774
@BryanCutler @MLnick I agree pick `HasParallel` into this PR because the
`trait` has very little code. Another feature is pending on this PR. So we hope
this get merged soon! cc @jkbradley
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19026
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19029
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance
generate negative result
## What changes were proposed in this pull request?
Because of numerical error
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
@felixcheung This error occur in the OneHotEncoder inside the RFormula I
think. Only OneHotEncoder will print this error message after I search the
project...
---
If your project is set up
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19029#discussion_r135186430
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -438,6 +438,10 @@ private[ml] object SummaryBuilderImpl extends Logging
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15435
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
ping @felixcheung We can make all R tests for trees deterministic (not only
random trees). Leave other problems to separate PR. It would be great to fix it
soon, Thanks!
---
If your project
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19065
[SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure
consistent output columns
## What changes were proposed in this pull request?
Add test for prediction using
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19065
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r135534873
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -85,6 +86,10 @@ abstract class Predictor[
M <: PredictionMo
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
@felixcheung In Jenkins Log I only found Random forest and Decision Tree
failed, random forest failed more frequently. thanks!
---
If your project is set up for it, you can reply to this
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/19026
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19072
[SPARK-17133][ML][FOLLOW-UP] Add convenient method `asBinary` for casting
to BinaryLogisticRegressionSummary
## What changes were proposed in this pull request?
add an "asB
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r135695930
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -85,6 +86,10 @@ abstract class Predictor[
M <: PredictionMo
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19072
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19078
[SPARK-21862] Add overflow check in PCA
## What changes were proposed in this pull request?
add overflow check in PCA, otherwise it is possible to throw
`NegativeArraySizeException
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19078#discussion_r135751225
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,13 @@ class PCA @Since("1.4.0") (@Since("1.4
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19078
cc @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19065#discussion_r135782045
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala
---
@@ -91,4 +94,54 @@ object
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19065
@smurching Code updated, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
cc @zhengruifeng
I update my comment you need check again, thanks!
I read the PR again, it still do not resolve double-caching issue in KMeans.
in KMean, your code
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng OK. so the the part of `KMeans` in this PR still works. No
need change I think.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17862
+1 for adding test on large-scale datasets.
Another thing I want to know is that: you can compare the final loss value
on the result coefficients, between LIBLINEAR(scikit-learn), LBFGS
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19078#discussion_r136032375
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,13 @@ class PCA @Since("1.4.0") (@Since("1.4
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r136069679
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r136067839
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r136072548
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -146,6 +161,8 @@ class LinearRegressionSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r136071530
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HuberAggregatorSuite.scala
---
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136243309
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -120,6 +120,33 @@ class CrossValidatorSuite
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
I think about this double-cache issue for a few days. One big problem is
that, we are hard get precise storage level info. For example, we may add `map`
transform on cached dataset and then
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@smurching Yes this should be added as a `ml.Param`, we should not add as
an argument.
@zhengruifeng Would you mind update the PR according to our discussion
result above ?
Make
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136482755
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -120,6 +120,33 @@ class CrossValidatorSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r136532646
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r136536168
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/16864
@Bcpoole Thanks for this PR. But I want to ask which place in spark can
this extension apply to ? e.g. can this algo used in join cost estimating or
somewhere else ? But if there is no
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19106
[SPARK-21770][ML] ProbabilisticClassificationModel fix corner case:
normalization of all-zero raw predictions
## What changes were proposed in this pull request
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19107
[SPARK-21799][ML] Fix `KMeans` performance regression caused by
double-caching
## What changes were proposed in this pull request?
Fix `KMeans` performance regression caused by
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng @jkbradley I create a PR #19107 for quick fix `KMeans` perf
regression bug.
This PR can continue to work on adding Param of `handlePersistence` which
is not so emergent
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19107
cc @jkbradley @smurching
This should be merged and backport to 2.2 ASAP!
Other improvement (e.g adding `handlePersistence` param) can be left in
this PR #17014
---
If your project is
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19108
[SPARK-21898][ML] Feature parity for KolmogorovSmirnovTest in MLlib
## What changes were proposed in this pull request?
Feature parity for KolmogorovSmirnovTest in MLlib
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19106#discussion_r136696592
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
---
@@ -245,6 +245,13 @@ private[ml] object
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19110
[SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest in both
Scala mllib and Pyspark
## What changes were proposed in this pull request?
Added tunable parallelism to
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18281
I take this PR over in #19110 because the original author is busy but we
need merge this PR soon.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19018
cc @felixcheung
I encounter RTest failed again even when this seed added.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81350/console
error:
```
Failed
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19110
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136719383
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136719561
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -87,37 +91,63 @@ class TrainValidationSplit @Since("
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136719485
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18538
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19111
I found `NaiveBayes` also possible to fail. Founded here #18538 . Hope this
can works!
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81316/console
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18902
+1 for using Dataframe-based version code.
@zhengruifeng One thing I want to confirm is that, I check your testing
code, both RDD-based version and Dataframe-based version code will
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng `KMeans` regarded as a bugfix(SPARK-21799) because the
double-cache issue is introduced in 2.2 and cause perf regression.
Other algos also have the same issue, but the issue
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18902
hmm... that's interesting. So I found performance gap between dataframe
codegen aggregation and the simple RDD aggregation. I will discuss with SQL
team for this later. Thanks!
---
If
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/18902
Sure. I will create JIRA after this perf gap is confirmed.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19122
[SPARK-21911][ML][PySpark] Parallel Model Evaluation for ML Tuning in
PySpark
## What changes were proposed in this pull request?
Add parallelism support for ML tuning in pyspark
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r136850665
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,23 @@ def _fit(self, dataset):
randCol = self.uid + "_rand"
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19108
cc @yanboliang Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/13794
+1 @jkbradley For now it is better to keep the current implementation for
the 4 meta-algo in pyspark.
@yinxusen Would you mind to close this PR ? But I still appreciate your
contribution
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r136933807
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,23 @@ def _fit(self, dataset):
randCol = self.uid + "_rand"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r136934638
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,24 @@ def _fit(self, dataset):
randCol = self.uid + "_rand"
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19020
Looks good. cc @jkbradley Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r137175343
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,24 @@ def _fit(self, dataset):
randCol = self.uid + "_rand"
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19110
@MLnick Conflict resolved. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r137264588
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,23 @@ def _fit(self, dataset):
randCol = self.uid + "_rand"
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19156
[SPARK-19634][FOLLOW-UP][ML] Improve interface of dataframe vectorized
summarizer
## What changes were proposed in this pull request?
Make several improvements in dataframe
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19156
cc @yanboliang @thunterdb Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r137542479
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala ---
@@ -85,6 +86,32 @@ private[ml] trait ValidatorParams extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r137546848
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala ---
@@ -85,6 +86,32 @@ private[ml] trait ValidatorParams extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r137545402
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala ---
@@ -85,6 +86,32 @@ private[ml] trait ValidatorParams extends
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17383
@facaiy So can you do benchmark first (by generating random testing data) ?
So we can see how much this can speed up
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19107
cc @smurching Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/15770
@wangmiao1981 Sorry for delay, I will take a look later, thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19156
Thanks @thunterdb code updated.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19156#discussion_r137740578
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -94,46 +97,86 @@ object Summarizer extends Logging {
* - min
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r137805843
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15770#discussion_r137800867
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18748#discussion_r137815796
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -356,6 +371,40 @@ class ALSModel private[ml
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19172
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19172#discussion_r137922379
--- Diff: python/pyspark/ml/tests.py ---
@@ -1655,6 +1655,25 @@ def
test_multinomial_logistic_regression_with_bound(self
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19172#discussion_r137922397
--- Diff: python/pyspark/ml/tests.py ---
@@ -1655,6 +1655,25 @@ def
test_multinomial_logistic_regression_with_bound(self
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19172#discussion_r137922474
--- Diff: python/pyspark/ml/classification.py ---
@@ -1425,11 +1425,13 @@ class MultilayerPerceptronClassifier(JavaEstimator,
HasFeaturesCol
901 - 1000 of 1170 matches
Mail list logo