Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/22236
Take it and good luck.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/22236
just FYI about another related PR:
https://github.com/apache/spark/pull/17280
and maybe I should close it? @srowen
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
BTW, @HyukjinKwon, do you know who's still reviewing the ML PRs? I have a
few old PRs and I really want to know which're considered
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
Thanks for the review @HyukjinKwon.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
Ah, this triggers the doc check. Updating.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/21981
[SAPRK-25011][ML]add prefix to __all__ in fpm.py
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-25011
add prefix to __all__
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21942
I think it's better to move the code and unit test in one PR. But since
it's not a trivial change, I suggest you to wait for committer
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21942#discussion_r207103066
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala ---
@@ -160,15 +160,89 @@ class StandardScalerModel private[ml
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Updated to support backward model loading compatibility.
@MLnick @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
gentle ping @MLnick, Thanks for the review. Appreciate if you have some
time for further comments.
---
-
To unsubscribe, e-mail
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/20028
Thanks for the comments @zhengruifeng @felixcheung
It's been nearly 8 months and it took me a while to recall what this PR
does. While the PR did provide some improvement for the cu
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21501#discussion_r194098947
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21501#discussion_r194099298
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189465979
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189466147
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189466112
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464881
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464891
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464861
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465580
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465534
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465567
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/20028
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17583
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
It's probably better to wait for the opinion from a committer.
---
-
To unsubscribe, e-mail: reviews-uns
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17280#discussion_r164942458
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Thanks for taking a look @MLnick
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17280#discussion_r161962624
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r159016507
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Si
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158344862
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,31 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158170154
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158154050
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158153277
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158153048
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,31 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19993
To make it available for other classes, we need to support checking for
both `fit` and `transform`, that means we also need a sample input Dataset, so
we may have to add the explicit test in each of
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/20028
[SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based
API
## What changes were proposed in this pull request?
As an initial step, the PR creates
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157870176
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157871042
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157867496
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -137,18 +137,17 @@ final class Bucketizer @Since("1.4.0")
(@Si
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157870214
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157869596
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19993
I would suggest to develop the common infrastructure and unit test first,
then other PR can take it or we can send follow-up fix.
cc @MLnick for advice
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Updated, use $lc and add a new unit test for doc and exception.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
> One option that came to my mind was that $ returns lowercase, so this is
used at most places but when you really need it you can access the original
(not necessarily lowercase) value.
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156772807
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156771298
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156770135
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r156758913
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -480,10 +640,14 @@ object LinearRegression extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r156758353
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19020
LGTM.
One thing I noticed is that we did not really compare the loss with other
lib (like sklearn), which is something also missing for other linear
algorithms. Do you think it would be a
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Many thanks for the review @smurakozi and @attilapiros.
> The PR is not complete (did not convert all Param[String] instances to
StringParam consistently) so it should be marked as
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156556262
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156556136
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156555847
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19525
Thanks for the review @yanboliang
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19894
LGTM. Good fix.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/10803
No it's not merged. Feel free to use the code as you wish.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.or
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r151854676
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r151029101
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19588
Also we need jira for python.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150748259
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150432943
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150432221
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150430257
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148969162
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968172
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968255
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967559
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967956
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967766
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967853
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968530
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/impl/Word2VecCBOWSolver.scala
---
@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148966706
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148634321
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala
---
@@ -49,8 +49,8 @@ final class RegressionEvaluator @Since("
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148634479
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -722,6 +722,17 @@ class LinearRegressionSummary private
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440930
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148442070
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440879
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444910
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444535
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148442709
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -287,9 +315,12 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444218
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440785
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
I used two ways to switch String params among different options:
1. In NaiveBayes: convert StringParam and String constants to lowercase.
2. in LinearRegression: .equalsIgnoreCase
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148438581
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -497,40 +481,46 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148438759
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148437931
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
Let me know if I missed anything, but I don't quite catch the part
> all the batches have the same length
IMO
`docs.sample(withReplacement = sampleWithRep
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
@akopich I'm actually leaning towards "filter after sample".
1. so we don't need to change `miniBatchFraction` in
` docs.sample(withReplacement = s
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562600
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562823
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -133,7 +134,7 @@ class NaiveBayes @Since("1.5.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562645
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562530
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc:
String
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/19599
[SPARK-22381] [ML] Add StringParam that supports valid options
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-22381
During
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147324011
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -142,6 +221,9 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147326457
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -344,33 +449,58 @@ class LinearRegression @Since("
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147321334
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache
1 - 100 of 974 matches
Mail list logo