[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95858814 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -365,7 +365,7 @@ class LogisticRegression @Since("

[GitHub] spark issue #16415: [SPARK-19063][ML]Speedup and optimize the GradientBooste...

2017-01-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16415 Thanks for the update. By "specify the storage level via a Param," I meant a public ```Param``` type in the the GBT API. Can you please check out other Params for examples and u

[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...

2017-01-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15018 @neggert I'll ask @mengxr about the negative weights since he oversaw the original work here. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95914074 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -312,90 +313,120 @@ class IsotonicRegression private

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95914179 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -312,90 +313,120 @@ class IsotonicRegression private

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95914244 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -312,90 +313,120 @@ class IsotonicRegression private

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95914526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -312,90 +313,120 @@ class IsotonicRegression private

[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...

2017-01-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15018 @zapletal-martin Pinging since you wrote the original PR: There's discussion here about whether IsotonicRegression should support negative weights. Is there a good reason to? I haven&#

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95928035 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -160,6 +162,17 @@ object KMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95928023 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,21 @@ class BisectingKMeansSuite

[GitHub] spark issue #16524: [SPARK-19110][MLLIB][FollowUP]: Add a unit test for test...

2017-01-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16524 LGTM Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15671 Thanks for the updates! For docConcentration and quantileProbabilities, I agree it could be problematic if these are too large. How about: * We don't log docConcentration since

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-01-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16607 Please see comment on JIRA; thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15311: [SPARK-17721][MLlib][backport] Fix for multiplying trans...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15311 Since this is a backport, you'll need to close this manually. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/13493 Ping --- let me know if you'd like someone to take it over. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15144: [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ sh...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15144 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15144: [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ sh...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15144 LGTM, merging with master and branch-2.0 Thank you @zero323 for the PR and @BryanCutler for reviewing ! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15124: [SPARK-17559][MLLIB]persist edges if their storage level...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15124 LGTM Will merge after re-running tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15339: Branch 2.0

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15339 I assume this is a mistake? Please close this issue or fix it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14653 ok to test Sorry for the delay on this, but it'd be great to fix now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #14233: [SPARK-16490] [Examples] added a python example for chis...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14233 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datetypes

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15314 @zhengruifeng Can you please update the PR title? It says "datetypes" instead of "datatypes" : ) --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark issue #15312: [SPARK-17744][ML] Parity check between the ml and mllib ...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15312 This LGTM. @yanboliang to confirm, this is what you had in mind, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15124: [SPARK-17559][MLLIB]persist edges if their storage level...

2016-10-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15124 Merging with master and branch-2.0 Thanks @dding3 ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81826499 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827276 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81826622 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827319 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81826887 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827371 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81810303 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827378 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827348 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827339 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827154 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827572 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81827397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445404 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445385 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445726 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445890 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445897 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445477 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445506 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445715 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445670 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445698 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445744 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445915 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445482 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445651 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445909 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445919 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445466 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445905 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445623 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82445705 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 Related to the docs, some more comments defining terminology would be useful for non-experts: * OR-amplification * probing buckets * false positives/negatives (w.r.t. finding nearest

[GitHub] spark pull request #13762: [SPARK-14926] [ML] OneVsRest labelMetadata uses i...

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13762#discussion_r82465493 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -196,8 +196,13 @@ final class OneVsRestModel private[ml

[GitHub] spark issue #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12930 It's still a problem. (I just tried the unit test on master.) Sorry for the delay @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of Vertex...

2016-10-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15396 I'm no expert on checkpointing, but the tests look fine to me. You could eliminated duplicated code in the tests by putting the shared code in a helper method. Tested locally to co

[GitHub] spark issue #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12930 Oh! Now I see the JIRA's linked JIRA. I agree that updating RFormula is a better solution and that we can close this PR. I'll check out the other PR. --- If your project is set up f

[GitHub] spark pull request #13675: [SPARK-15957] [ML] RFormula supports forcing to i...

2016-10-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13675#discussion_r82491913 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -97,6 +97,26 @@ class RFormula(override val uid: String

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82641617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82641642 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #13675: [SPARK-15957] [ML] RFormula supports forcing to index la...

2016-10-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/13675 I just noticed that last item, but otherwise, this looks ready to me. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82695010 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693177 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693200 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693981 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693129 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82694406 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693156 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693541 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82693111 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #12374: [SPARK-14610][ML] Remove superfluous split for continuou...

2016-10-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12374 LGTM Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15431 LGTM2 Thanks! Is it fine with you if this just gets fixed in master, not branch-2.0 (since the other PR is not in branch-2.0 since it adds a new public API)? --- If your project is set

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15431 I'll go ahead and merge with master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15428 Thanks! I'll take a look. Could you please fix the typo in the title? "enchanced" -> "enhanced" --- If your project is set up for it, you can reply to th

[GitHub] spark issue #15501: Branch 2.0

2016-10-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15501 please close this issue, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83760037 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -128,8 +145,11 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911419 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -66,11 +67,13 @@ private[feature] trait

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -66,11 +67,13 @@ private[feature] trait

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83760784 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -270,10 +270,10 @@ private[ml] trait HasFitIntercept extends

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911300 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911357 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911408 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -128,8 +145,11 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911464 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -66,11 +67,13 @@ private[feature] trait

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911574 --- Diff: python/pyspark/ml/feature.py --- @@ -1157,9 +1157,11 @@ class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadab

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911282 --- Diff: docs/ml-features.md --- @@ -1104,9 +1104,11 @@ for more details on the API. `QuantileDiscretizer` takes a column with continuous features

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911542 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -83,10 +83,20 @@ class QuantileDiscretizerSuite

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911537 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -83,10 +83,20 @@ class QuantileDiscretizerSuite

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83760190 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark pull request #15428: [SPARK-17219][ML] enhanced NaN value handling in ...

2016-10-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r83911499 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -114,6 +115,7 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 OK merging with master Thanks @imatiach-msft and @thunterdb ! @imatiach-msft I agree about sparse testing. This has all of the MLlib tests, but we should add more in the future

[GitHub] spark pull request #17321: [SPARK-19899][ML] Replace featuresCol with itemsC...

2017-03-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17321#discussion_r106970923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -37,7 +37,20 @@ import org.apache.spark.sql.types._ /** * Common

[GitHub] spark issue #17321: [SPARK-19899][ML] Replace featuresCol with itemsCol in m...

2017-03-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17321 LGTM Thanks for the PR! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 Taking a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTe...

2017-03-20 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17368 [SPARK-20039][ML] rename ChiSquare to ChiSquareTest ## What changes were proposed in this pull request? I realized that since ChiSquare is in the package stat, it's pretty unclea

<    2   3   4   5   6   7   8   9   10   11   >