[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15671 Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...

2017-01-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15018 Thanks for the updates! This LGTM, except for deciding about negative weights. Responding to your comment above, negative weights are just as problematic as 0 weights. See my comment

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15314 Thanks for pinging! LGTM pending fresh tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96707360 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -177,6 +177,8 @@ class

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96708089 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +72,156 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96708082 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -67,3 +66,12 @@ trait Loss extends Serializable

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96707275 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +158,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96708053 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -315,8 +368,9 @@ object GBTClassificationModel extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96708064 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -52,4 +51,10 @@ object LogLoss extends Loss { // The

[GitHub] spark issue #16539: [SPARK-8855][MLlib][PySpark] Python API for Association ...

2017-01-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16539 @zhengruifeng You're correct. @aray Thanks for the PR, but it will be best if we add this to the DataFrame-based API instead. Could you please close this issue? In the future, I'd

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r96756901 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r96756358 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r96756712 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r96756732 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16441 LGTM Merging with master Thanks @imatiach-msft and @sethah for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-19 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15730 The API looks good to me. I have not reviewed the internals carefully. One comment: Let's add a check to verify that numMidDimSplits is > 0. --- If your project is set up for

[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost

2017-01-19 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14547 I'd recommend overriding setImpurity in the relevant concrete classes. In those, you can add warnings in the Scala doc and also add logWarning messages about deprecation. That's almo

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r97126808 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r97126805 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r97405721 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +160,18 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r97405668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -828,8 +828,27 @@ private[spark] object RandomForest extends

[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15018 Sounds good. I'll run fresh tests before merging to be safe though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15211 LGTM Thanks @hhbyyh and also @yanboliang and @zhengruifeng for helping with review! Merging with master One more step towards feature parity for the DataFrame-based API! --- If

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15211 I'll create follow-up JIRAs (linked from this PR's JIRA). @hhbyyh Can I assign one or more to you? --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 LGTM Thanks! Will merge after fresh tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15018: [SPARK-17455][MLlib] Improve PAVA implementation in Isot...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15018 Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 Merging with master. Will try to backport to branch-2.1 as well. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 I was able to check out this commit and test it with branch-2.1, but now I can't get the merge script to merge it for branch-2.1. @srowen would you mind trying? Thanks! --- If your proje

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15314 Thanks @zhengruifeng and sorry for the delay. Merging with master now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2017-01-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14872 @smurching Sorry we haven't had time to continue with this. Please don't delete the branch; I'd like to pick it up eventually! --- If your project is set up for it, you can repl

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 Oh OK! Thanks @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16377 LGTM Thanks! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-01-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang Would you have time to take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141066 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -63,7 +63,7 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141074 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141071 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141073 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141079 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98141077 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98309772 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC

[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15768 Btw, @yanboliang and @Yunni did you sync? I'm fine with the takeover, but don't want to stomp on toes. Both can be listed as authors when this gets merged. Should we close this issu

[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16694 LGTM, thank you! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes fo...

2017-01-27 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/16723 [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Python Params and LinearSVC ## What changes were proposed in this pull request? * Removed Since tags in Python Params since they

[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16723 @wangmiao1981 Would you mind checking this? It has small fixes I noticed when reviewing your PR for Python LinearSVC. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959562 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959519 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959540 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959499 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959524 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959585 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959506 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959414 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959536 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98959548 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes fo...

2017-02-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16723#discussion_r99015614 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -47,7 +47,7 @@ private[classification] trait LinearSVCParams

[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-02-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16723 I delayed too! I just pushed a fix. I couldn't test it since it looks like the Java 8 doc gen has already been broken again. (Thanks a lot for the efforts to fix it! Btw, are you pingin

[GitHub] spark issue #12420: [SPARK-14585][ML][WIP] Provide accessor methods for Pipe...

2017-02-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12420 I missed the ClassTag question above. Let me take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-02-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16723 OK thanks a lot @HyukjinKwon and @wangmiao1981 ! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #12420: [SPARK-14585][ML][WIP] Provide accessor methods for Pipe...

2017-02-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12420 Well, after spending a while looking around, I haven't found a good way to write this and make it Java friendly (i.e., not use ClassTag, Type, or TypeTag). Does anyone else have ideas? I&#

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-02-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16646#discussion_r99253729 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala --- @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16607 Sorry for the delay; will take a look now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16607 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r99263525 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -320,14 +340,29 @@ object Word2VecModel extends MLReadable[Word2VecModel

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r99259617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -18,10 +18,9 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16607#discussion_r99263532 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -302,16 +302,36 @@ class Word2VecModel private[ml] ( @Since("

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16607 LGTM Merging with master Thanks @Krimit ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16741: [SPARK-19402][DOCS] Support LaTex inline formula correct...

2017-02-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16741 Thanks for these many cleanups! It's a shame to lose links. Do you think we should use fully qualified names rather than abandoning the links? --- If your project is set up for it, yo

[GitHub] spark issue #16814: [SPARK-19467][ML][PYTHON]Remove cyclic imports from pysp...

2017-02-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16814 LGTM Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16814: [SPARK-19467][ML][PYTHON]Remove cyclic imports from pysp...

2017-02-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16814 Btw, do you have a need to backport this to previous releases? Or is master sufficient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-02-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16495 @mhmoudr Will you be able to update this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 Sorry for the delay! This sounds like an involved discussion, so I put my thoughts on the JIRA. Let me know what you think. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99977340 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -335,6 +335,9 @@ class GeneralizedLinearRegression

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-02-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16811 Thanks for the PR! What about findSynonymsArray? That still implies a local value and is more specific. Also, can you please add a unit test for this? --- If your project is

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16740 LGTM Merging with master Thank you + @sethah for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private

[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private

[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-02-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16495 Thanks @mhmoudr As far as the stress test, I'd recommend posting instructions as a Github gist and linking it to wherever you post results on JIRA or a PR. We wouldn't want to add

[GitHub] spark pull request #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scal...

2016-11-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16009#discussion_r90330019 --- Diff: docs/ml-features.md --- @@ -1188,7 +1188,9 @@ categorical features. The number of bins is set by the `numBuckets` parameter. I that the

[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-30 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16009 LGTM Merging with master and branch-2.1 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

2016-11-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90346927 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

2016-11-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90347884 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

2016-11-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90347291 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

2016-11-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90348160 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 LGTM too Thanks a lot! Merging with master, branch-2.1, branch-2.0 Has anyone heard of complaints of this in current use cases of earlier branches? If not, I won't backpo

[GitHub] spark pull request #16076: [SPARK-18324][ML][DOC] Update ML programming and ...

2016-12-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16076#discussion_r90542680 --- Diff: docs/ml-guide.md --- @@ -60,152 +60,37 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in

[GitHub] spark issue #16076: [SPARK-18324][ML][DOC] Update ML programming and migrati...

2016-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16076 Sounds good about SPARK-18291. I responded inline above about SPARK-18481. Apart from this update, this looks ready to me. Thank you! --- If your project is set up for it, you can

[GitHub] spark issue #15795: [SPARK-18081] Add user guide for Locality Sensitive Hash...

2016-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15795 Could you please add tags "[ML][DOCS]" to the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark issue #15795: [SPARK-18081] Add user guide for Locality Sensitive Hash...

2016-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15795 +1 for consolidating the examples. The boilerplate of creating a dataset and setting algorithm parameters takes up most of the example. I would create 1 example per algorithm which does

[GitHub] spark issue #16118: [SPARK-18291][SPARKR][ML] Revert "[SPARK-18291][SPARKR][...

2016-12-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16118 LGTM Merging with master and branch-2.1 Thanks a lot for understanding & reverting this for now! --- If your project is set up for it, you can reply to this email and have your r

[GitHub] spark issue #16076: [SPARK-18324][ML][DOC] Update ML programming and migrati...

2016-12-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16076 LGTM I'll merge this with master and branch-2.1 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark issue #15795: [SPARK-18081][ML][DOCS] Add user guide for Locality Sens...

2016-12-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15795 I can take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15795: [SPARK-18081][ML][DOCS] Add user guide for Locality Sens...

2016-12-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15795 I found myself wanting to make a number of tiny comments, so I thought it'd be easier to send a PR. Could you please take a look at this one? Thanks! --- If your project is set up f

[GitHub] spark issue #15795: [SPARK-18081][ML][DOCS] Add user guide for Locality Sens...

2016-12-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15795 LGTM merging with master and branch-2.1 Thanks all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16169: [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API ...

2016-12-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16169 I don't really see the harm in letting users specify probabilityCol beforehand, except that they may not have a good way to map the indices to String labels. I'm OK with removing

[GitHub] spark pull request #16139: [SPARK-18705][ML][DOC] Update user guide to refle...

2016-12-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16139#discussion_r91422901 --- Diff: docs/ml-advanced.md --- @@ -59,17 +59,25 @@ Given $n$ weighted observations $(w_i, a_i, b_i)$: The number of features for each

[GitHub] spark issue #16064: [SPARK-18633][ML][Example]: Add multiclass logistic regr...

2016-12-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16064 LGTM I just tested it locally I'll rerun tests before merging test this please --- If your project is set up for it, you can reply to this email and have your reply appear on G

<    7   8   9   10   11   12   13   14   15   16   >