[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107801822 --- Diff: python/pyspark/ml/tests.py --- @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self): self.assertTrue(np.isclose(model2

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 @hhbyyh OK that seems reasonable; I could see us adding support for multiple items in the future as well. Thanks for confirming! --- If your project is set up for it, you can reply

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17130 Noting here: Please check out the "Issue this PR brought up" here: https://github.com/apache/spark/pull/17218 It may affect this PR. Thanks! --- If your project is set up f

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 LGTM will merge after tests Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107773640 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107774888 --- Diff: python/pyspark/ml/tests.py --- @@ -1243,6 +1244,45 @@ def test_tweedie_distribution(self): self.assertTrue(np.isclose(model2

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107773828 --- Diff: python/pyspark/ml/tests.py --- @@ -60,6 +60,7 @@ from pyspark.ml.regression import LinearRegression, DecisionTreeRegressor

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107770880 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107774188 --- Diff: python/pyspark/ml/tests.py --- @@ -1243,6 +1244,45 @@ def test_tweedie_distribution(self): self.assertTrue(np.isclose(model2

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 Issue this PR brought up: * Background: AssociationRules currently return a 1-element array for the consequent (predicted item). This makes sense b/c, even though multiple consequents could

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757886 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757685 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107761241 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107759052 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107758576 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107758565 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107758054 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757520 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757555 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757391 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17218#discussion_r107757033 --- Diff: python/pyspark/ml/fpm.py --- @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 Sure, I can take a look. Let me ping @mlnick too since he marked himself as shepherd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17130 I'll be happy to help get this merged now that the column renaming is done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16722 Hi all, I can try to track this work now. > This patch maintains the meaning of minInstancesPerNode, in that the parameter still corresponds to raw, unweighted counts. It also adds a

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 LGTM except for the one doc nit. When you update this, could you also please make and link JIRAs for the Python wrapper and doc update? --- If your project is set up for it, you can reply

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107718109 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlation.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107717895 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107717637 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/util/TestingUtils.scala --- @@ -32,6 +32,10 @@ object TestingUtils { * the relative

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107284141 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/LinalgUtils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107283647 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107074556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107074472 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107075473 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17108#discussion_r107283840 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

spark git commit: [SPARK-20039][ML] rename ChiSquare to ChiSquareTest

2017-03-21 Thread jkbradley
est, distribution, or what. This PR renames it to ChiSquareTest to clarify this. ## How was this patch tested? Existing unit tests Author: Joseph K. Bradley <jos...@databricks.com> Closes #17368 from jkbradley/SPARK-20039. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

[GitHub] spark issue #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTest

2017-03-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17368 Yep, thanks for confirming that @srowen and checking it out @imatiach-msft and @MLnick ! Merging with master --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTest

2017-03-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17368 CC @thunterdb @imatiach-msft What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTe...

2017-03-20 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17368 [SPARK-20039][ML] rename ChiSquare to ChiSquareTest ## What changes were proposed in this pull request? I realized that since ChiSquare is in the package stat, it's pretty unclear

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 Taking a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-19899][ML] Replace featuresCol with itemsCol in ml.fpm.FPGrowth

2017-03-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master fc7554599 -> bec6b16c1 [SPARK-19899][ML] Replace featuresCol with itemsCol in ml.fpm.FPGrowth ## What changes were proposed in this pull request? Replaces `featuresCol` `Param` with `itemsCol`. See

[GitHub] spark issue #17321: [SPARK-19899][ML] Replace featuresCol with itemsCol in m...

2017-03-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17321 LGTM Thanks for the PR! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17321: [SPARK-19899][ML] Replace featuresCol with itemsC...

2017-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17321#discussion_r106970923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -37,7 +37,20 @@ import org.apache.spark.sql.types._ /** * Common

spark git commit: [SPARK-19635][ML] DataFrame-based API for chi square test

2017-03-16 Thread jkbradley
sed API Author: Joseph K. Bradley <jos...@databricks.com> Closes #17110 from jkbradley/df-hypotests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c320054 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c32

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 OK merging with master Thanks @imatiach-msft and @thunterdb ! @imatiach-msft I agree about sparse testing. This has all of the MLlib tests, but we should add more in the future

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17237 Thanks for the PR! I just merged the fix for https://issues.apache.org/jira/browse/SPARK-11569 which will affect this PR. Would you mind updating this PR to include SPARK-11569's handling

spark git commit: [SPARK-11569][ML] Fix StringIndexer to handle null value properly

2017-03-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master d4a637cd4 -> 85941ecf2 [SPARK-11569][ML] Fix StringIndexer to handle null value properly ## What changes were proposed in this pull request? This PR is to enhance StringIndexer with NULL values handling. Before the PR, StringIndexer will

[GitHub] spark issue #17233: [SPARK-11569][ML] Fix StringIndexer to handle null value...

2017-03-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17233 LGTM Merging with master Thanks for the improvement! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-19940][ML][MINOR] FPGrowthModel.transform should skip duplicated items

2017-03-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 5e96a57b2 -> d4a637cd4 [SPARK-19940][ML][MINOR] FPGrowthModel.transform should skip duplicated items ## What changes were proposed in this pull request? This commit moved `distinct` in its intended place to avoid duplicated predictions

[GitHub] spark issue #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform should ...

2017-03-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17283 Thanks for fixing this issue! LGTM Merging with master Stating the JIRA number for a bug fix is reasonable, though it's most useful if the bug appears in an actual release

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 Ping @imatiach-msft any more comments after the update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [MINOR][ML] Improve MLWriter overwrite error message

2017-03-13 Thread jkbradley
hor: Joseph K. Bradley <jos...@databricks.com> Closes #17215 from jkbradley/write-err-msg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/72c66dbb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/72c66dbb Diff: h

[GitHub] spark issue #17215: [MINOR][ML] Improve MLWriter overwrite error message

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17215 Thanks! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 True, if minSupport can be shared, then that's OK. confidence won't be shared though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r10571 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -122,6 +122,86 @@ class StringIndexerSuite assert

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105706598 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -39,20 +39,21 @@ import

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105719023 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -122,6 +122,86 @@ class StringIndexerSuite assert

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105719645 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -122,6 +122,86 @@ class StringIndexerSuite assert

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105721463 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -188,35 +189,45 @@ class StringIndexerModel

[GitHub] spark issue #17233: [SPARK-11569][ML] Fix StringIndexer to handle null value...

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17233 I'll take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17130 The updated transform looks good; thanks for pinging! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r105702725 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -56,8 +56,8 @@ private[fpm] trait FPGrowthParams extends Params

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 Thanks for the PR! I'll wait until this isn't "WIP" to review it thoroughly, but I'll make two comments now: * The params should not be added to shared.py since they are not shar

[GitHub] spark issue #16002: [SPARK-18341][ML] Eliminate use of SingularMatrixExcepti...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16002 @yanboliang Sorry for missing earlier discussion. I'm OK with declaring defeat here, though I still disagree about using exceptions. I agree that passing an obscure error code up is not ideal

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 I just reversed my opinion about a shared "Statistics" object. See https://github.com/apache/spark/pull/17108#issuecomment-285200613 for details. I pushed an update per y

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 Given further thought, I'd prefer we stick to the API specified in the design doc, with a Correlations object instead of a generic Statistics object. In the future, we may want optional Params

[GitHub] spark pull request #17215: [MINOR][ML] Improve MLWriter overwrite error mess...

2017-03-08 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17215 [MINOR][ML] Improve MLWriter overwrite error message ## What changes were proposed in this pull request? Give proper syntax for Java and Python in addition to Scala. ## How

spark git commit: [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe

2017-03-07 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.0 0cc992c89 -> e69902806 [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe ## What changes were proposed in this pull request? The `keyword_only` decorator in PySpark is not thread-safe. It writes kwargs to a

[GitHub] spark issue #17195: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17195 LGTM Merging with branch-2.0 Thank you again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

spark git commit: [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe

2017-03-07 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.1 3b648a626 -> 0ba9ecbea [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe ## What changes were proposed in this pull request? The `keyword_only` decorator in PySpark is not thread-safe. It writes kwargs to a

[GitHub] spark issue #17193: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17193 LGTM Merging with branch-2.1 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16811 Thanks! I made a follow-up JIRA for updating the Python API: https://issues.apache.org/jira/browse/SPARK-19866 --- If your project is set up for it, you can reply to this email and have your

spark git commit: [SPARK-17629][ML] methods to return synonyms directly

2017-03-07 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master d8830c503 -> 56e1bd337 [SPARK-17629][ML] methods to return synonyms directly ## What changes were proposed in this pull request? provide methods to return synonyms directly, without wrapping them in a dataframe In performance sensitive

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16811 LGTM Merging with master Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16739 I've commented elsewhere, but wanted to here just to make more people aware: Let's refrain from backporting new APIs into patch versions unless they are really critical. We do not do

[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16512 It would definitely be considered a new API, though I agree with you that it's probably safe. That said, I'm not a fan of such changes in patch versions unless they really are necessary

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Btw, are you interested in updating the Python API too? https://issues.apache.org/jira/browse/SPARK-19852 --- If your project is set up for it, you can reply to this email and have your reply

spark git commit: [SPARK-17498][ML] StringIndexer enhancement for handling unseen labels

2017-03-07 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master c05baabf1 -> 4a9034b17 [SPARK-17498][ML] StringIndexer enhancement for handling unseen labels ## What changes were proposed in this pull request? This PR is an enhancement to ML StringIndexer. Before this PR, String Indexer only supports

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 LGTM Merging with master Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 Thanks @MLnick for the explanation. This is what I'd understood from your similar description on the JIRA, but definitely more in-depth. (It might be good to copy to JIRA, or even a design doc

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick OK I think I misunderstood some of your comments above then. I see the proposal in SPARK-14409 differs from this PR, so I agree it'd be nice to resolve it. We can make changes

spark git commit: [SPARK-19382][ML] Test sparse vectors in LinearSVCSuite

2017-03-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 9991c2dad -> 926543664 [SPARK-19382][ML] Test sparse vectors in LinearSVCSuite ## What changes were proposed in this pull request? Add unit tests for testing SparseVector. We can't add mixed DenseVector and SparseVector test case, as

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16784 LGTM Thanks @wangmiao1981 ! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489270 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,9 +92,8 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489264 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489244 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489280 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -142,18 +167,17 @@ class StringIndexerModel

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16784 Thanks for the updates. This LGTM pending the conflict resolution. Sorry for the delay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 +1 for moving implementation to traits, as long as the public methods are still Java-friendly. (Methods which are implemented in traits often can't be called from Java.) --- If your project

[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...

2017-03-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16623 Thanks for the comments. I definitely agree with many of your combined statements: * R has not been declared stable. (Though where in the docs is this even stated? I was unable to find

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-03-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r104331119 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

spark git commit: [SPARK-19535][ML] RecommendForAllUsers RecommendForAllItems for ALS on Dataframe

2017-03-05 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 369a148e5 -> 70f9d7f71 [SPARK-19535][ML] RecommendForAllUsers RecommendForAllItems for ALS on Dataframe ## What changes were proposed in this pull request? This is a simple implementation of RecommendForAllUsers & RecommendForAllItems

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'll merge this with master now Thanks @sueann and @MLnick for feedback. I'll prioritize helping with your work on transform, metrics, and tuning for ALS next. --- If your project is set

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-03-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r104330610 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/17165 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296156 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -142,18 +166,18 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296099 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -105,7 +125,11 @@ class StringIndexer @Since("

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296562 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +187,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296367 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +92,17 @@ class StringIndexer @Since("1.4.0") (

<    7   8   9   10   11   12   13   14   15   16   >