[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107550356 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107550502 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107689306 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-23 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-23 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801115 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801091 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801133 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107803088 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832982 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107842023 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-23 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 @dbtsai Thanks for the good suggestions. I rearranged the test suites, removed redundancies, and filled in some gaps. Things got a bit jumbled when changing some of the methods around. I also added

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107954866 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-03-24 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r107961224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -644,21 +644,29 @@ class LogisticRegression @Since("

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-24 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 Thanks for finally cooperating, Jenkins! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-24 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 Thanks for all the time reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17426: [SPARK-17137][ML][WIP] Compress logistic regressi...

2017-03-24 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/17426 [SPARK-17137][ML][WIP] Compress logistic regression coefficients ## What changes were proposed in this pull request? Use the new `compressed` method on matrices to store the logistic

[GitHub] spark issue #17419: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-03-27 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17419 Is this being targeted for Spark 2.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...

2017-03-30 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17478 I guess I don't see the harm in keeping these checks. Yes, in this case we always call `LogisticAggregator` after we have gone through the same data with `MultivariateOnlineSummarizer`, but i

[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...

2017-03-30 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17478 Yeah, I would support adding a unit test to the logistic aggregator (well, all aggregators) for these types of things. I do think it's better to keep them and add a couple tests, but I don&#

[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...

2017-03-30 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17478 Checking the size is a constant time operation, but in `add` we do also do a linear time dot product. I do not think this affects performance. I don't exactly mind removing it, but not che

[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2017-03-30 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16722 I don't think I'll have enough time before 2.2. Please feel free to take it over. I will try to help with review. Otherwise I could pick it back up if it doesn't make 2.2 --- If

[GitHub] spark issue #17501: [SPARK-20183][ML] Added outlierRatio arg to MLTestingUti...

2017-04-01 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17501 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111410877 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -112,9 +124,9 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111411026 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -126,9 +138,10 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 If we are attempting to match R GBM, it would be great to show, at least on the PR, that we get the same results. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111434055 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -104,6 +104,18 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 Seems like a reasonable change. Just left some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17416: [SPARK-20075][CORE][WIP] Support classifier, packaging i...

2017-04-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17416 @srowen Can you confirm what happens when the jars are not found in your local m2 cache? Do you still find the `-models` jar in the ivy2 cache? --- If your project is set up for it, you can reply

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-01 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94288799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-01 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94288738 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-01 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94288852 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -57,6 +58,29 @@ class DecisionTreeRegressorSuite

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-01 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94288764 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends Logging

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16452 What is not right? Could you be more specific? The behavior for master branch seems to align with the comments, but maybe I'm missing it. --- If your project is set up for it, you can reply to

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16452 @mpjlu This is the behavior I get: scala scala> import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.classification.LogisticRegress

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16457 I tend to think the actual algorithms should handle invalid weights, instead of adding that check into instance creation. Also, this will add overhead each time an instance is created

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94618477 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2006,7 +2075,7 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94638095 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -149,15 +149,34 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94621426 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1762,51 +1781,101 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94615848 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94615417 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94617024 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1159,17 +1388,23 @@ class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94521336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94520622 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -788,19 +797,39 @@ class LogisticRegressionModel private

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94521006 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94615507 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94613945 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -788,19 +797,39 @@ class LogisticRegressionModel private

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94616155 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94640068 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1762,51 +1781,101 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94638575 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1762,51 +1781,101 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94616573 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1120,21 +1239,129 @@ sealed trait

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94520673 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -788,19 +797,39 @@ class LogisticRegressionModel private

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94643188 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2006,7 +2075,7 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94640294 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1756,55 +1765,105 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94614489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -652,18 +652,27 @@ class LogisticRegression @Since("

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94620487 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1120,21 +1239,129 @@ sealed trait

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-01-04 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94520863 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1120,21 +1239,129 @@ sealed trait

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-01-04 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 I still don't know what the plan _was_ for the downcasting issue, and it doesn't seem to me like there's really a good solution. `evaluate` returns a `LogisticRegressionSummary`

[GitHub] spark issue #16477: [Minor] Correct LogisticRegression test case for probabi...

2017-01-05 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16477 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-01-05 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 pinging other potential reviewers @jkbradley @srowen @MLnick I think this is an important patch for multiclass logistic regression. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94867743 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94855997 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94861988 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94868112 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94868309 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94854198 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94857458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94861226 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94901794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94902030 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95039050 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -356,13 +427,243 @@ class GaussianMixture @Since("

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95039573 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,104 @@ class GaussianMixtureSuite extends

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2017-01-06 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15413 I did a quick pass and it looks pretty good. I'll take a more thorough look at the tests this weekend, but if you want to merge it I think any of those items could be addressed in follo

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95064130 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,93 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95064061 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,93 @@ class GaussianMixtureSuite extends

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95064361 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -141,4 +242,37 @@ object GaussianMixtureSuite

[GitHub] spark pull request #15413: [SPARK-17847][ML] Reduce shuffled data size of Ga...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15413#discussion_r95195793 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala --- @@ -126,9 +143,93 @@ class GaussianMixtureSuite extends

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2017-01-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15413 Left one small comment which isn't a blocker. LGTM otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95275614 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -176,6 +203,18 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95182716 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -828,8 +828,27 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95183469 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +161,33 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95275814 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -176,6 +203,18 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95275971 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +161,33 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95276132 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95286979 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +268,38 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95284176 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95284948 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95286255 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95286413 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95285636 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,6 +321,13 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95284126 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95285904 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95287318 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95276987 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -17,20 +17,23 @@ package

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95284606 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +268,38 @@ class GBTClassificationModel private[ml

[GitHub] spark issue #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sendi...

2017-01-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16279 Can we close it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardSc...

2017-01-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16471 Is this a duplicate of [SPARK-18385](https://issues.apache.org/jira/browse/SPARK-18385). Can you please see the discussion on the PR: https://github.com/apache/spark/pull/15831/files --- If your

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2017-01-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/9 @yinxusen Do you think you'll have time to work on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15831 I think we decided to go a different direction than what is proposed here? Actually, I still think there's merit in fixing the problem without having to do full feature ports. Either way, I&

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95429487 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,6 +321,13 @@ class GBTClassificationModel private[ml

<    4   5   6   7   8   9   10   11   12   >