[GitHub] spark pull request #18733: [SPARK-21535][ML]Reduce memory requirement for Cr...

2017-07-25 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/18733 [SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit ## What changes were proposed in this pull request? CrossValidator and TrainValidationSplit both use

[GitHub] spark issue #18728: [SPARK-21524] [ML] unit test fix: ValidatorParamsSuiteHe...

2017-07-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18728 That appears to be all right. Sending update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18728: [SPARK-21524] [ML] unit test fix: ValidatorParamsSuiteHe...

2017-07-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18728 Yes, that's good. But I just found there's one rule in the scala style check "Tests must extend org.apache.spark.SparkFunSuite instead." try to ignore it? --- If your p

[GitHub] spark issue #18728: [SPARK-21524] [ML] unit test fix: ValidatorParamsSuiteHe...

2017-07-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18728 Thanks for your attention. @srowen The temp dir cleanup function is implemented in trait `DefaultReadWriteTest` which extends `TempDirectory`, not from `SparkFunSuite`. And as you said, the

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18313 Current implementation of `CrossValidator` (with or without this PR) **NEVER** holds all the trained models in the driver memory at the same time. It collects models sequentially and allows GC to

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-24 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/18313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18728: [SPARK-21524] [ML] fix temp dir

2017-07-24 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/18728 [SPARK-21524] [ML] fix temp dir ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-21524 ValidatorParamsSuiteHelpers.testFileMove() is

[GitHub] spark issue #12533: [SPARK-14760] [ML] Feature transformers should always in...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/12533 Thanks for the attention @MLnick. I'm closing most of my stale PRs. For this one, I found all of the transformers in the PR already have ` transformSchema(dataset.schema, logging =

[GitHub] spark issue #12037: [SPARK-14239] [ML] Add load for LDAModel that supports b...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/12037 Closing stale PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #12037: [SPARK-14239] [ML] Add load for LDAModel that sup...

2017-07-24 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/12037 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #10803: [SPARK-12875] [ML] Add Weight of Evidence and Inf...

2017-07-24 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/10803 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #10803: [SPARK-12875] [ML] Add Weight of Evidence and Informatio...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/10803 Closing stale PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #12533: [SPARK-14760] [ML] Feature transformers should al...

2017-07-24 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/12533 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #12533: [SPARK-14760] [ML] Feature transformers should always in...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/12533 Close it since it's been overlooked for some time. Thanks for the review and comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #11102: [SPARK-13223] [ML] Add stratified sampling to ML ...

2017-07-24 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/11102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #11102: [SPARK-13223] [ML] Add stratified sampling to ML feature...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11102 Close it since it's been overlooked for some time and can be implemented with https://github.com/apache/spark/pull/17583 easily. Thanks for the review and comments. --- If your project is s

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-24 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r129107041 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-21 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128886371 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark issue #17461: [SPARK-20082][ml][WIP] LDA incremental model learning

2017-07-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17461 For the initial model, I think you can just use a String param for the model path. refer to https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/clustering

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r128043915 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-15 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r127590053 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-15 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r127589970 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r126508854 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r126503993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r126503728 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r126505794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r126507840 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-07-06 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 add tuning summary for crossValidator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-07-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r125782939 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala --- @@ -226,6 +230,29 @@ class TrainValidationSplitModel private[ml

[GitHub] spark pull request #17280: [SPARK-19939] [ML] Add support for association ru...

2017-07-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17280#discussion_r125736603 --- Diff: python/pyspark/ml/fpm.py --- @@ -186,29 +186,29 @@ class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol, |[z

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-05 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Yes, Both LBFGS and OWLQN generate similar model with sklearn if without intercept. About replacing OWLQN with LBFGS, I noticed if using hinge loss, sometimes OWLQN uses fewer iterations

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-06-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 @MLnick Thanks for your attention. I'm not sure if SPARK-19053 is still active and maybe it's not a blocking issue for this change. If you don't mind, I'll extend the jira

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 @yanboliang Without intercept, sklearn and Spark LinearSVC will get the same coefficients on several dataset I tested. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 On many large dataset, LinearSVC cannot get the similar result with sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark LinearSVC will get (10, 20, 30, 40). It's different b

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-06-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r124576026 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -272,36 +272,16 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-06-26 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 This is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-06-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r124138183 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/RDDLossFunction.scala --- @@ -50,7 +50,7 @@ private[ml] class RDDLossFunction[ Agg

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-06-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r124133920 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala --- @@ -0,0 +1,364 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-06-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r124135629 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -38,34 +40,39 @@ private[ml] trait

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-06-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r124135824 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -38,34 +40,39 @@ private[ml] trait

[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

2017-06-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17864 Shall we pay extra attention to the Int case? E.g. input column contains Double.Nan, 1, 2. The current implementation will return surrogate as 1.5. I'm not sure if it&

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-06-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 The error looks irrelevant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-06-23 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 Change the constructor func parameter to UserDefinedFunction. This helps resolve the type issue during save/load and makes it adaptable to Python. Thanks for the suggestion from @yanboliang

[GitHub] spark issue #18315: [SPARK-21108] [ML] [WIP] convert LinearSVC to aggregator...

2017-06-15 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18315 Will send further update after https://github.com/apache/spark/pull/18305 merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #18315: [SPARK-21108] [ML] [WIP] convert LinearSVC to agg...

2017-06-15 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/18315 [SPARK-21108] [ML] [WIP] convert LinearSVC to aggregator framework ## What changes were proposed in this pull request? convert LinearSVC to new aggregator framework ## How was this

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-06-14 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/18313 [SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala ## What changes were proposed in this pull request? Allow `CrossValidatorModel` and

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Sure. That's reasonable. I'll move the hingeAggregator to a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-06-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Merge the change from https://github.com/apache/spark/pull/17645 into a single change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18154: [SPARK-20932][ML]CountVectorizer support handle p...

2017-06-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18154#discussion_r121836024 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -154,13 +155,19 @@ class CountVectorizer @Since("1.5.0"

[GitHub] spark issue #17645: [SPARK-20348] [ML] Support squared hinge loss (L2 loss) ...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17645 OK. I'll close it for now and try to merge it with https://github.com/apache/spark/pull/17862. Thanks for the comment from @yanboliang --- If your project is set up for it, you can rep

[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...

2017-06-11 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/17645 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r121312651 --- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r121312459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-06-09 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r121183230 --- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #17645: [SPARK-20348] [ML] Support squared hinge loss (L2 loss) ...

2017-06-07 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17645 Hi @HyukjinKwon I think this is a feature we need, but currently we are still having some discussion about optimizer interface. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #18034: [SPARK-20797][MLLIB]fix LocalLDAModel.save() bug.

2017-05-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18034#discussion_r117792674 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala --- @@ -468,7 +469,16 @@ object LocalLDAModel extends Loader[LocalLDAModel

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r117592116 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,16 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116351996 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,24 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139174 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139610 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139038 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #10466: [SPARK-12375] [ML] add handleinvalid for vectorin...

2017-05-11 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/10466 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #10466: [SPARK-12375] [ML] add handleinvalid for vectorindexer

2017-05-11 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/10466 Close this for now until I got some time for this. We would need to evaluate the performance and see what's the best option. Thanks for pinging @HyukjinKwon --- If your project is set up f

[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...

2017-05-10 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17910 @zhengruifeng That may be the best solution I see for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115669376 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -526,7 +526,7 @@ class LogisticRegression @Since("

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115668587 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115657829 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -223,6 +229,25 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115656752 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115416085 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -526,7 +526,7 @@ class LogisticRegression @Since("

[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r115416204 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -890,7 +890,7 @@ object LogisticRegression extends

[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17912 cc @srowen @jkbradley @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17912: [SPARK-20670] [ML] Simplify FPGrowth transform

2017-05-08 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17912 [SPARK-20670] [ML] Simplify FPGrowth transform ## What changes were proposed in this pull request? As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform

[GitHub] spark issue #17894: [SPARK-17134][ML] Use level 2 BLAS operations in Logisti...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17894 I'm not sure how much acceleration we can get from Level 2 BLAS. For benchmark, we also would need to evaluate the performance for sparse data. --- If your project is set up for it, you can

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 @debasish83 There're several approaches trying to smooth the hinge loss. https://en.wikipedia.org/wiki/Hinge_loss. For the one you're proposing, do you know if it's used in other SVM

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115303395 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

2017-05-05 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17864 I imagine most Int features will need to be converted to Double for a Vector, thus returns Double regardless the input type makes sense, which also makes the implementation more straight forward

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-05-05 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Update: switch to HasSolver trait and use OWLQN as default optimizer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115050353 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-05-04 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 ping @jkbradley Sorry I know this is like the last minute for 2.2, but the change may be important for user experience. If we're not comfortable making API change right now, how about we just c

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-04 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17862 [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSVC ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-20602 Currently

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-04-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 @felixcheung, reverted the code change of `transform` as requested. Please check the update. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r114002856 --- Diff: docs/ml-frequent-pattern-mining.md --- @@ -0,0 +1,87 @@ +--- +layout: global +title: Frequent Pattern Mining +displayTitle

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r114002811 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -82,8 +81,8 @@ private[fpm] trait FPGrowthParams extends Params with

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r114002784 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -268,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF

[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...

2017-04-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17645#discussion_r113836900 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,15 +44,35 @@ import org.apache.spark.sql.functions.{col, lit

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113783993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -268,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113782786 --- Diff: docs/ml-frequent-pattern-mining.md --- @@ -0,0 +1,87 @@ +--- +layout: global +title: Frequent Pattern Mining +displayTitle

[GitHub] spark issue #17767: Refactoring of the ALS code

2017-04-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17767 Preparing a PR like this takes a lot of efforts. Please try to follow the guidelines in http://spark.apache.org/contributing.html. (create a jira and rename the title). Like you said, I

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113220960 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -268,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r112278365 --- Diff: docs/ml-frequent-pattern-mining.md --- @@ -0,0 +1,80 @@ +--- +layout: global +title: Frequent Pattern Mining +displayTitle

[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-18 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17673 Thanks for sharing the work. To help make the review easier, I would recommend: 1. Provide some background info. Is the new algorithm better than the existing one and in which cases

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2017-04-17 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 I'll update this after FPGrowth examples and doc merged https://github.com/apache/spark/pull/17130, since there'll be some conflicts. --- If your project is set up for it, you can rep

[GitHub] spark pull request #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary t...

2017-04-16 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17654 [SPARK-20351] [ML] Add trait hasTrainingSummary to replace the duplicate code ## What changes were proposed in this pull request? Add a trait HasTrainingSummary to avoid code duplicate

[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...

2017-04-15 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17645 [SPARK-20348] [ML] Support squared hinge loss (L2 loss) for LinearSVC ## What changes were proposed in this pull request? While Hinge loss is the standard loss function for linear SVM

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r111312845 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -287,6 +290,27 @@ class LinearSVCModel private[classification

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r111313220 --- Diff: python/pyspark/ml/classification.py --- @@ -172,6 +172,59 @@ def intercept(self): """ return self._call_

[GitHub] spark issue #6000: [Spark-7475][MLlib] adjust ldaExample for online LDA

2017-04-12 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/6000 @redsofa I would advise copy the MLlib LDAOptimizer code to your own project, add related logging to next() and just run it with your application code. --- If your project is set up for it

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r110990099 --- Diff: examples/src/main/python/ml/fpgrowth_example.py --- @@ -0,0 +1,48 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110982571 --- Diff: python/pyspark/ml/classification.py --- @@ -172,6 +172,47 @@ def intercept(self): """ return self._call_

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110982055 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable

<    1   2   3   4   5   6   7   8   9   10   >