[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93973973 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93972115 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93972699 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93981193 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93974759 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93974324 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93974634 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93981206 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93973211 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93982097 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r93995903 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94062847 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r94073643 --- Diff: python/pyspark/ml/feature.py --- @@ -2629,8 +2629,28 @@ class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol, Ja

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r94073623 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala --- @@ -79,6 +79,12 @@ class ChiSqSelectorSuite extends SparkFunSuite

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML]add feature selector meth...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r94073566 --- Diff: docs/ml-features.md --- @@ -1423,12 +1423,12 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection. It

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2016-12-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15413 I'll take a look, thanks for pinging! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94078123 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94085951 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,6 +141,106 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94086048 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -18,22 +18,37 @@ package

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94084918 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94085211 --- Diff: python/pyspark/ml/clustering.py --- @@ -95,15 +95,10 @@ class GaussianMixture(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIte

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94084884 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94083559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94086090 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,6 +141,106 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94082859 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94083864 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r94084799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -323,27 +326,95 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94088566 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15211 Thanks! The updates look good. I'll check out the unit tests now. Thanks for looking into the default intercept. Also, let me know if you find literature about convergence analys

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94091205 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94090777 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94090877 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94091453 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94091151 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94091025 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94090785 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,554 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2016-12-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94093958 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,558 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16415 predErrorCheckpointer should already be persisting and unpersisting predError. This PR's changes will mean: * persist will use MEMORY_AND_DISK instead of MEMORY_ONLY * 1 (instead

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15212 Will do! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document mino...

2016-12-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16434#discussion_r94165525 --- Diff: python/pyspark/ml/feature.py --- @@ -2629,6 +2629,8 @@ class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol, Ja

[GitHub] spark pull request #16436: [SPARK-18698][ML][MLLIB] Adding public constructo...

2016-12-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16436#discussion_r94174586 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -219,6 +219,16 @@ class StringIndexerSuite

[GitHub] spark issue #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor chang...

2016-12-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16434 Also, could you please change the PR description to be self-contained (rather than just referencing another PR)? The description becomes the commit message. --- If your project is set up for

[GitHub] spark issue #16436: [SPARK-18698][ML] Adding public constructor that takes u...

2016-12-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16436 LGTM Merging with master Thanks @imatiach-msft ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16415 Thanks for checking! Does changing the storageLevel in predErrorCheckpointer fix the problem? "other use cases": Well, I remember thinking about this a lot when a

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16355 @yu-iskw Pinging on this since you wrote bisecting k-means originally. Do you have time to take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16441 Thanks for the PR; I do want to get this fixed. However, I don't think this is the right way to make predictions of probabilities for GBTs. I believe it should depend on the loss used.

[GitHub] spark issue #12823: [SPARK-14985][ML] Update LinearRegression, LogisticRegre...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12823 @BenFradet I'm sorry for dropping the ball on this one. Did you close this due to inactivity? If you're willing, it would be nice to do this cleanup. To answer your

[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16415 @zdh2292390 Thanks for the update. Given that this will change behavior for existing workloads, I'll ask that we specify it via a Param. Also, I'm going to create a new JIRA for thi

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16457 +1 for @sethah 's comment: Algorithms should validate input data. Some already do: https://github.com/apache/spark/blob/b67b35f76b684c5176dc683e7491fd01b43f4467/mllib/src/main/scala/org/a

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16453 LGTM except for the style nit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16453#discussion_r94496997 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -127,13 +127,11 @@ class NaiveBayes @Since("

[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15768 Pinging on this: What's a reasonable ETA for updating the PR? Thanks @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #14394: [SPARK-16786] [Python] [WIP] LDA topic distributions API...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14394 @supremekai Thanks for the PR! I'm sorry about the inactivity on this. However, now that it has been added to the DataFrame-based API (in pyspark.ml), we will not be adding it to the RDD-

[GitHub] spark issue #12491: [SPARK-14712][ML]spark.ml.LogisticRegressionModel.toStri...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12491 @hujy Sorry for the delay! ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #11520: [SPARK-13677][MLLIB] Support Tree-Based Feature Transfor...

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/11520 Sorry about the inaction on this! As you said on the JIRA, let's redo this for the DataFrame-based API. In the meantime, could you please close this issue? Thanks a lot. --- If your pr

[GitHub] spark pull request #12135: [SPARK-14352][SQL] approxQuantile should support ...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12135#discussion_r94640047 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,18 +1364,41 @@ def approxQuantile(self, col, probabilities, relativeError): Space

[GitHub] spark pull request #12135: [SPARK-14352][SQL] approxQuantile should support ...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12135#discussion_r94640052 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,18 +1364,41 @@ def approxQuantile(self, col, probabilities, relativeError): Space

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15671 Taking a look now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94640747 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -225,7 +230,7 @@ class LinearRegression @Since("

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94641380 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -227,6 +227,11 @@ class AFTSurvivalRegression @Since

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94641388 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -905,7 +911,10 @@ class LDA @Since("1.6.0") (

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94641394 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -888,6 +888,12 @@ class LDA @Since("1.6.0") ( @Si

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-04 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15314 Sorry for the delay, will look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15314: [SPARK-17747][ML] WeightCol support non-double nu...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15314#discussion_r94645280 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -47,18 +47,47 @@ object MLTestingUtils extends SparkFunSuite

[GitHub] spark pull request #15314: [SPARK-17747][ML] WeightCol support non-double nu...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15314#discussion_r94645495 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -137,10 +169,11 @@ object MLTestingUtils extends SparkFunSuite

[GitHub] spark pull request #15314: [SPARK-17747][ML] WeightCol support non-double nu...

2017-01-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15314#discussion_r94645270 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -118,12 +148,14 @@ object MLTestingUtils extends SparkFunSuite

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94861192 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -344,6 +344,10 @@ final class OneVsRest @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94862191 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -85,9 +86,27 @@ private[spark] class Instrumentation[E <: Estima

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94861627 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala --- @@ -116,13 +116,17 @@ class TrainValidationSplit @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94860559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -344,6 +344,10 @@ final class OneVsRest @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94861932 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -85,9 +86,27 @@ private[spark] class Instrumentation[E <: Estima

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94860863 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -344,6 +344,10 @@ final class OneVsRest @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94860281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -344,6 +344,10 @@ final class OneVsRest @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94861991 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -85,9 +86,27 @@ private[spark] class Instrumentation[E <: Estima

[GitHub] spark pull request #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document mino...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16434#discussion_r94867603 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala --- @@ -35,22 +35,63 @@ class ChiSqSelectorSuite extends SparkFunSuite

[GitHub] spark issue #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor chang...

2017-01-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16434 Thanks @mpjlu ! The changes look good, except that I'd like to have a code snippet for verifying with R. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94871938 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -76,6 +77,18 @@ private[ml] trait ValidatorParams extends HasSeed

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94872466 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -339,11 +344,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16480#discussion_r94872199 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -76,6 +77,18 @@ private[ml] trait ValidatorParams extends HasSeed

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16480 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16480 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16480 LGTM pending Jenkins tests Thanks @sueann ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95001542 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -160,6 +162,17 @@ object KMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95001382 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -29,9 +29,12 @@ class BisectingKMeansSuite final

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95001534 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,23 @@ class BisectingKMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95001517 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,23 @@ class BisectingKMeansSuite

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95031363 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,104 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95031388 --- Diff: python/pyspark/ml/clustering.py --- @@ -95,15 +95,10 @@ class GaussianMixture(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIte

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95031349 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95031946 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,104 @@ class GaussianMixtureSuite extends

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2017-01-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15413 This LGTM @sethah Any further comments before we merge it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16480 Merging with master. Not backporting unless people request it since this memory leak is very minor. Thanks @sueann ! --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2017-01-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15413 OK, I'll just wait so @sethah can make a final pass and so @yanboliang can merge the 2 tests. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel returns dif...

2017-01-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16491 Yikes, thanks for fixing this! LGTM Merging with master I'll also try to merge it with branch-2.1, branch-2.0, branch-1.6 but will say if I run into issues. --- If your proje

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95094550 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -328,74 +336,80 @@ class IsotonicRegression private

[GitHub] spark pull request #15018: [SPARK-17455][MLlib] Improve PAVA implementation ...

2017-01-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15018#discussion_r95094551 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala --- @@ -328,74 +336,80 @@ class IsotonicRegression private

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-01-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16495 The key point @vlad17 made was that an operation which should be O(N) is taking O(N^2) in the current implementation. Let's fix that, regardless of whether or not we add a stress

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16494 Thanks for the patch. This sounds like it may be the same bug being addressed in https://issues.apache.org/jira/browse/SPARK-14804 so I'll CC @tdas If so, then I believe the b

[GitHub] spark pull request #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel w...

2017-01-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12064#discussion_r95858775 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -130,6 +130,25 @@ class GaussianMixtureModel private[ml

<    1   2   3   4   5   6   7   8   9   10   >