[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

2018-06-08 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r194143768 --- Diff: python/pyspark/ml/classification.py --- @@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20629 Right - so while it’s perhaps a lower quality metric it is different. So I wonder if deprecation is the right approach (vs say putting the within cluster sum squares into ClusteringEvaluator

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20629 Sorry I mean putting the metric in evaluator and then also deprecating computCost On Sun, 18 Feb 2018 at 20:41, Nick Pentreath wrote: > Right - so while it’s perhaps a lo

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20629 Just want to check - does `computeCost` do the same thing as the silhouette metric? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165575680 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165568368 --- Diff: docs/ml-statistics.md --- @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165568014 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165567614 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362568 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362364 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165360692 --- Diff: docs/ml-statistics.md --- @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362533 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362148 --- Diff: docs/ml-statistics.md --- @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362440 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-02-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20459 Merged to master / branch-2.3. Thanks @yanboliang ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20421 Didn't this go into 2.2.1? On Wed, 31 Jan 2018 at 20:37 WeichenXu wrote: > @MLnick <https://github.com/mlnick> > Forget one fix: #18797 <https://github.c

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20421 Merged to master / branch-2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20421 @felixcheung just added a few more behavior changes I found. Should be final now. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-29 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20332 Merged to master / branch-2.3. Thanks @sethah, and @WeichenXu123 for review. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164654897 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164479596 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +123,8 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164387272 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +123,8 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164384660 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java

[GitHub] spark pull request #20421: [SPARK-23112][DOC] Update ML migration guide with...

2018-01-29 Thread MLnick
GitHub user MLnick opened a pull request: https://github.com/apache/spark/pull/20421 [SPARK-23112][DOC] Update ML migration guide with breaking changes. Add breaking change note to ML migration guide. ## How was this patch tested? Doc only You can merge this pull

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 I reverted this (see #20410 for details) - we can re-open it once that issue is solved. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20410 I reverted #19892 in master (f5911d4894700eb48f794133cbd363bf3b7c8c8e) / branch-2.3 (a8a3e9b7cf7b9346c43cfbbf7b26fd2fd28dd521), so that other test runs can be unblocked

[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20410 I think this is somewhat related to #15113 cc @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20410 We should just revert SPARK-22797 for now to unblock others. SPARK-22799 itself is not the cause per se (it passed tests) but after it was merged SPARK-22797 causes the failure

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 Merged to master / branch-2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19993 Merged to master / branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19993 Thanks @mgaido91 and @jkbradley for working on this and others for review --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20363: [SPARK-23112][DOC] Add highlights and migration guide fo...

2018-01-25 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20363 Merged to master/branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20363: [SPARK-23112][DOC] Add highlights and migration guide fo...

2018-01-25 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20363 `SPARK-20047` was in 2.2 (and mentioned in the previous highlights). We could also mention `SPARK-20619` but I've tried to limit the list. I checked through the issues for 2.3 and cou

[GitHub] spark issue #20363: [SPARK-23112][WIP][DOC] Add highlights and migration gui...

2018-01-24 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20363 It’s just listing any breaking changes, if I missed them. Will do a pass to check and then remove WIP. If any folks know of breaking changes ping the JIRA issue On Wed, 24 Jan 2018 at

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-24 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r163562784 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -401,15 +390,24 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-24 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r163561075 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -20,8 +20,11 @@ package org.apache.spark.ml.param import java.io

[GitHub] spark pull request #20363: [SPARK-23112][WIP][DOC] Add highlights and migrat...

2018-01-23 Thread MLnick
GitHub user MLnick opened a pull request: https://github.com/apache/spark/pull/20363 [SPARK-23112][WIP][DOC] Add highlights and migration guide for 2.3 Update ML user guide with highlights and migration guide for `2.3`. ## How was this patch tested? Doc only. You

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-23 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 RC2 has been cut - @jkbradley do you see #19993 as a blocker? I think it should be merged for `2.3`. And also there are QA JIRAs (sub-tasks of [SPARK-23105](https://issues.apache.org/jira/browse

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19993 Well yes it would - but the method checks inputCols/inputCol first so will always fail for that reason here, ie we aren’t actually testing the full code path On Mon, 22 Jan 2018 at 16:43

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19993 Overall looks good with @jkbradley's changes. I just left a comment on the param test cases as I think they're not quit

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162940665 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -401,15 +390,14 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 @holdenk everything except my comment in https://github.com/apache/spark/pull/19892#discussion_r162900053 --- - To unsubscribe

[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r162900053 --- Diff: python/pyspark/ml/feature.py --- @@ -315,13 +315,19 @@ class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873193 --- Diff: examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py --- @@ -43,6 +43,43 @@ # Print the coefficients and

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873036 --- Diff: docs/ml-classification-regression.md --- @@ -97,10 +97,6 @@ only available on the driver. [`LogisticRegressionTrainingSummary`](api/scala

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162872261 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +117,6 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873388 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala --- @@ -49,6 +49,48 @@ object

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-21 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 If it is going to get merged to `branch-2.3` the `since` tags need to be `2.3.0` again --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator document an...

2018-01-19 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20257 Merged to master / branch-2.3, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #20293: [SPARK-23127][DOC] Update FeatureHasher guide for catego...

2018-01-19 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20293 Merged to master / branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 I’m generally ok with these small python api wrapper additions getting merged as long as the risk of breaking anything is low - and here it is since it’s just api parity On Fri, 19 Jan

[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...

2018-01-18 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20275#discussion_r162292944 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with

[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...

2018-01-18 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20275#discussion_r162292520 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162043704 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,27 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162042318 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,27 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark issue #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator document an...

2018-01-17 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20257 A couple minor comments, otherwise looks fine. I see we are changing the example names, so effectively removing the old examples. I'm ok with this, unless others have an obje

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r162040939 --- Diff: docs/ml-features.md --- @@ -783,11 +783,11 @@ Because this existing `OneHotEncoder` is a stateless transformer, it is not usab

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r162038403 --- Diff: docs/ml-features.md --- @@ -783,11 +783,11 @@ Because this existing `OneHotEncoder` is a stateless transformer, it is not usab

[GitHub] spark pull request #20293: [SPARK-23127][DOC] Update FeatureHasher guide for...

2018-01-17 Thread MLnick
GitHub user MLnick opened a pull request: https://github.com/apache/spark/pull/20293 [SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter Update user guide entry for `FeatureHasher` to match the Scala / Python doc, to describe the `categoricalCols` parameter

[GitHub] spark issue #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator document an...

2018-01-17 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20257 Added a few more small comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161989945 --- Diff: docs/ml-features.md --- @@ -777,17 +777,17 @@ for more details on the API. ## OneHotEncoder (Deprecated since 2.3.0) -Because

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161989107 --- Diff: docs/ml-features.md --- @@ -777,17 +777,17 @@ for more details on the API. ## OneHotEncoder (Deprecated since 2.3.0) -Because

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-17 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161988396 --- Diff: docs/ml-features.md --- @@ -777,17 +777,17 @@ for more details on the API. ## OneHotEncoder (Deprecated since 2.3.0) -Because

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-01-16 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19876 Do we want to think about an `options` / `option` interface too? I'm not that familiar with whether there could be important options for PMML export, but custom user formats may need it (I

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-01-16 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19876 So to be clear this doesn't handle the `read` path at all? Would there be a plan to implement a similar read API? Overall I like the idea of an open API for plugging in model serializ

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161722235 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161740882 --- Diff: examples/src/main/python/ml/onehot_encoder_estimator_example.py --- @@ -18,32 +18,31 @@ from __future__ import print_function

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161739866 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161741274 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161740612 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java --- @@ -35,41 +34,37 @@ import

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161722104 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161739788 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161740927 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderEstimatorExample.scala --- @@ -19,38 +19,34 @@ package

[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r161719111 --- Diff: python/pyspark/ml/feature.py --- @@ -317,26 +317,34 @@ class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable

[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r161683821 --- Diff: python/pyspark/ml/feature.py --- @@ -317,13 +317,19 @@ class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable

[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r161683714 --- Diff: python/pyspark/ml/feature.py --- @@ -347,6 +353,28 @@ class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasHandleInvalid

[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r161684641 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -134,6 +134,16 @@ def toListFloat(value): return [float(v) for v in value

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r161681586 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,27 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-16 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r161682506 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -401,15 +390,9 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #17280: [SPARK-19939] [ML] Add support for association ru...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17280#discussion_r161679593 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel

[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20164#discussion_r161535696 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -170,21 +170,24 @@ final class OneVsRestModel private[ml

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/18904 @mpjlu could you post the actual results of test runs (timing numbers and shuffle data)? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/18904 @mpjlu could you post the actual results of test runs (timing numbers and shuffle data)? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161472191 --- Diff: docs/ml-features.md --- @@ -775,7 +775,9 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder (Deprecated

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161477464 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderEstimatorExample.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161475879 --- Diff: docs/ml-features.md --- @@ -775,7 +775,9 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder (Deprecated

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161473460 --- Diff: docs/ml-features.md --- @@ -807,6 +809,36 @@ for more details on the API. +## OneHotEncoderEstimator + +[One-hot

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-15 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161472954 --- Diff: docs/ml-features.md --- @@ -775,7 +775,9 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder (Deprecated

[GitHub] spark issue #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python A...

2018-01-12 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20241 LGTM thanks. Merged to master / branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19991: [SPARK-22801][ML][PYSPARK] Allow FeatureHasher to treat ...

2017-12-31 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19991 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-31 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19715 Merged to master. If there are any further small comments / clean ups we can do that during QA for 2.3 Thanks @huaxingao and all others for review

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r159100390 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -430,4 +433,45 @@ object ParamsSuite extends SparkFunSuite

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r159100191 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,27 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r159099688 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -137,18 +137,10 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-29 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r159100299 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -401,15 +401,9 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-29 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19715 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-29 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19715 Thanks for the changes @huaxingao. This LGTM now - any further comments from others? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-12-24 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19527 Agree on keeping the new OneHotEncoderEstimator as an alias for 3.0 On Fri, 1 Dec 2017 at 23:29, jkbradley wrote: > *@jkbradley* commented on this pull requ

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2017-12-21 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19993 Yes featurehasher doesn’t need inputCol and inputCols - since it is a new multi column transformer. We may see this more in future - as I think new transformers should be able to work on

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-21 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r158239692 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -386,19 +382,16 @@ class QuantileDiscretizerSuite

  1   2   3   4   5   6   7   8   9   10   >