Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150717860
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150719166
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19666
Btw, this is going to conflict with
https://github.com/apache/spark/pull/19433 a lot. @WeichenXu123 and @smurching
have you planned for merging one before the other
Repository: spark
Updated Branches:
refs/heads/master c8b7f97b8 -> d8741b2b0
[SPARK-21911][ML][FOLLOW-UP] Fix doc for parallel ML Tuning in PySpark
## What changes were proposed in this pull request?
Fix doc issue mentioned here:
https://github.com/apache/spark/pull/19122#issuecomment-340111
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19641
merging with master
Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Thanks for the explanation! Given the complexity here, I'm OK with the
random seed approach but recommend we add a warning about sampling being more
efficient but potentially non-determin
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
@liancheng I see you've worked with PathFilters in Spark SQL, so I'll ask
here: We're uncertain about how PathFilters are used in Hadoop, and it would be
helpful to understand (a
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150621371
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -298,8 +385,21 @@ object TrainValidationSplitModel extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150621222
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -301,11 +395,29 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150619165
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -265,23 +317,58 @@ class TrainValidationSplitModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150620648
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -187,6 +191,55 @@ class CrossValidatorSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150619215
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -177,7 +202,9 @@ class TrainValidationSplit @Since("
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r150373516
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0"
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Thanks for the updates! My only remaining comments are about:
* Default arguments for readImages in Scala not being Java-friendly (I'd
still recommend taking the easy route by hav
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r150367190
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r150361985
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150349566
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -852,6 +662,41 @@ private[spark] object RandomForest extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150159113
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/SplitUtils.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150159513
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/SplitUtils.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150158027
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -627,221 +621,37 @@ private[spark] object RandomForest extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150160368
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/SplitUtils.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150352747
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala
---
@@ -112,7 +113,7 @@ private[spark] object ImpurityStats
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150154413
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/AggUpdateUtils.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19433#discussion_r150309552
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/SplitUtils.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19433
CC @dbtsai in case you're interested b/c of Sequoia forests
---
-
To unsubscribe, e-mail: reviews-uns
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149228980
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -101,6 +101,20 @@ class TrainValidationSplit @Since("
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149227862
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -323,39 +338,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149226734
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -108,11 +111,19 @@ abstract class MLWriter extends BaseReadWrite with
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149225714
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -323,39 +338,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149217598
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -262,15 +273,26 @@ class CrossValidatorModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149226550
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -323,39 +338,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149227049
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -108,11 +111,19 @@ abstract class MLWriter extends BaseReadWrite with
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r149225109
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -262,15 +273,26 @@ class CrossValidatorModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148909027
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148908923
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19208
Done with review. I mainly review CrossValidator since some comments will
apply to TrainValidationSplit as well. Thanks for the PR
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148885057
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -271,6 +303,20 @@ class CrossValidatorModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148885486
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -271,6 +303,20 @@ class CrossValidatorModel private[ml
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148886008
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -282,12 +328,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148886190
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -282,12 +328,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148885817
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -282,12 +328,40 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148908525
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -187,6 +191,50 @@ class CrossValidatorSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148859381
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -252,19 +252,29 @@ object CrossValidator extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148859543
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -325,14 +328,19 @@ object CrossValidatorModel extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148859043
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -252,19 +252,29 @@ object CrossValidator extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148859168
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -252,19 +252,29 @@ object CrossValidator extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148860542
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -108,6 +108,13 @@ abstract class MLWriter extends BaseReadWrite with
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148857274
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -276,12 +315,32 @@ object TrainValidationSplitModel extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19641
LGTM pending tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148699685
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148698771
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148698295
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148694771
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala
---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148696592
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695824
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695893
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695760
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695558
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148693252
--- Diff: python/pyspark/ml/tests.py ---
@@ -1818,6 +1819,24 @@ def tearDown(self):
del self.data
+class ImageReaderTest
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695330
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148695505
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148692919
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148692696
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148691448
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -187,6 +191,50 @@ class CrossValidatorSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148375136
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -236,12 +252,17 @@ object CrossValidator extends
MLReadable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148371489
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
---
@@ -82,7 +82,11 @@ private[shared] object
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19208#discussion_r148690866
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -261,17 +290,40 @@ class CrossValidatorModel private[ml
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Will do as soon as I can!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/18538
@yanboliang @mgaido91 I just saw this PR. It creates a new test data
directory. Could you please send a quite update to move the data to the
existing data directory: https://github.com/apache
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Quick comment: I see that data are being added under
mllib/src/test/resources/ That appears to be a new directory, created
recently. The standard directory is
https://github.com/apache/spark
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19208
taking a look...
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19122
Whoops, could you please send a follow-up PR to do 1 doc update?
* Update https://github.com/apache/spark/blob/master/docs/ml-tuning.md to
say this is supported in Python. (Search for "
Repository: spark
Updated Branches:
refs/heads/master b3d8fc3dc -> 20eb95e5e
[SPARK-21911][ML][PYSPARK] Parallel Model Evaluation for ML Tuning in PySpark
## What changes were proposed in this pull request?
Add parallelism support for ML tuning in pyspark.
## How was this patch tested?
Test
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19122
LGTM
Merging with master
Thanks @WeichenXu123 , @BryanCutler and @viirya !
---
-
To unsubscribe, e-mail: reviews
Repository: spark
Updated Branches:
refs/heads/branch-2.2 9ed64048a -> 35725f735
[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test
dataset not deterministic)
## What changes were proposed in this pull request?
Fix NaiveBayes unit test occasionly fail:
Set seed f
Repository: spark
Updated Branches:
refs/heads/master b377ef133 -> 841f1d776
[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test
dataset not deterministic)
## What changes were proposed in this pull request?
Fix NaiveBayes unit test occasionly fail:
Set seed for `
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19558
LGTM
Tested locally, and it fixed the non-determinism.
Merging with master and branch-2.2
Thanks @WeichenXu123
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r146940974
--- Diff: python/pyspark/ml/tests.py ---
@@ -836,6 +836,27 @@ def test_save_load_simple_estimator(self):
loadedModel
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r146941252
--- Diff: python/pyspark/ml/tests.py ---
@@ -836,6 +836,27 @@ def test_save_load_simple_estimator(self):
loadedModel
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19122
Discussed elsewhere: We'll delay the multi-model fitting optimization in
favor of getting this in for now. Taking a loo
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/18924
I'll update JIRA later; it seems like Apache JIRA is having problems right
now.
---
-
To unsubscribe, e-mail: re
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/18924
@akopich I'm afraid pings on Git don't work for me; I just have too many to
keep up with. Again, sorry for the delays; I have very limited bandwidth
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/18924
Merging with master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Repository: spark
Updated Branches:
refs/heads/master 1f25d8683 -> 52facb006
[SPARK-14371][MLLIB] OnlineLDAOptimizer should not collect stats for each doc
in mini-batch to driver
Hi,
# What changes were proposed in this pull request?
as it was proposed by jkbradley , ```gammat``` are
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/18924
LGTM
Sorry for the delay!
I'll merge it after re-running tests
---
-
To unsubscribe, e-mail: reviews-uns
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19433
add to whitelist
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/18924#discussion_r142826379
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/18924#discussion_r142826326
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -462,31 +462,54 @@ final class OnlineLDAOptimizer extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19020
> We have two candidate name: epsilon or m
I see; that seems fine then, though I worry that we use "epsilon" in MLlib
(tests) for "a very small positive number."
Repository: spark
Updated Branches:
refs/heads/master 3e6a714c9 -> f180b6534
[SPARK-22060][ML] Fix CrossValidator/TrainValidationSplit param persist/load bug
## What changes were proposed in this pull request?
Currently the param of CrossValidator/TrainValidationSplit persist/loading is
hard
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19278
LGTM
Merging with master
Thanks @WeichenXu123 for the fix and for testing for backwards
compatibility!
---
-
To
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19278#discussion_r140621871
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala
---
@@ -160,11 +160,13 @@ class TrainValidationSplitSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/18924#discussion_r140621444
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r140375759
--- Diff: python/pyspark/ml/tests.py ---
@@ -836,6 +836,27 @@ def test_save_load_simple_estimator(self):
loadedModel
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r140375921
--- Diff: python/pyspark/ml/tuning.py ---
@@ -14,15 +14,16 @@
# See the License for the specific language governing permissions and
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r140375849
--- Diff: python/pyspark/ml/tests.py ---
@@ -986,6 +1007,25 @@ def test_save_load_simple_estimator(self):
loadedModel
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19278#discussion_r140373060
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala
---
@@ -160,11 +160,13 @@ class TrainValidationSplitSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19278#discussion_r140373382
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -396,17 +396,24 @@ private[ml] object DefaultParamsReader
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19278#discussion_r140373339
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -396,17 +396,24 @@ private[ml] object DefaultParamsReader
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19278#discussion_r140374060
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -303,16 +302,17 @@ object CrossValidatorModel extends
MLReadable
601 - 700 of 8523 matches
Mail list logo