[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19110 Thanks @MLnick @BryanCutler . Would you mind helping review another similar PR #19122 ? We need some other features but blocking on that PR. Thanks

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-09-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/9183 @minixalpha Sorry for delay. Too busy recently. But I will try to finish and commit my new PR once I get time. Thanks

[GitHub] spark pull request #19107: [SPARK-21799][ML] Fix `KMeans` performance regres...

2017-09-11 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19107 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19107 OK. Thanks @zhengruifeng .I will close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19122#discussion_r138249937 --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py --- @@ -152,6 +152,8 @@ def get$Name(self): ("varianceCol", "

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19208 [SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala ## What changes were proposed in this pull request? 1. We add a parameter

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16774 @BryanCutler @MLnick I found a bug in this PR: after save estimator (CV or TVS) and then load again, the "Parallelism" setting will be lost. But I fix this in #19208

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18313 @hhbyyh I apologize to you that your PR is valuable (in the case model list is very big). But now your PR is stale and I integrate it into my new PR #19208 Would you mind to take a

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19208 cc @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r138393318 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -212,14 +238,12 @@ object CrossValidator extends MLReadable

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r138389265 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -261,17 +290,40 @@ class CrossValidatorModel private[ml

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r138391134 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -150,20 +150,14 @@ private[ml] object ValidatorParams

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh Test result looks good! OWLQN takes longer time for each iteration, because each iteration's line search, it made more passes on da

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19122#discussion_r138518235 --- Diff: python/pyspark/ml/tuning.py --- @@ -208,23 +210,23 @@ class CrossValidator(Estimator, ValidatorParams, MLReadable, MLWritable

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19122#discussion_r138518283 --- Diff: python/pyspark/ml/tuning.py --- @@ -193,7 +194,8 @@ class CrossValidator(Estimator, ValidatorParams, MLReadable, MLWritable

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19122 @BryanCutler code updated. thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19110: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19110#discussion_r138519719 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -297,6 +298,16 @@ final class OneVsRest @Since("

[GitHub] spark pull request #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since...

2017-09-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19214 [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag ## What changes were proposed in this pull request? add missing since tag for `setParallelism` in #19110 ## How was

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19214 cc @srowen Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138577518 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -483,24 +488,24 @@ class LogisticRegression @Since

[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19204 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19156: [SPARK-19634][FOLLOW-UP][ML] Improve interface of datafr...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19156 ping @yanboliang Any other comments ? We need merge this before 2.3 release. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18924 ping @akopich This is an very useful improvement. Can you update the code while you're at it ? --- - To unsubscri

[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19204#discussion_r138763970 --- Diff: python/pyspark/ml/evaluation.py --- @@ -328,6 +329,87 @@ def setParams(self, predictionCol="prediction", label

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19208 oh...sorry for that, I integrate @hhbyyh's old PR into this new one, because I found the code "dump models to disk" and "collect models" seem to be cohesive and s

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19208 @jkbradley I split this PR, removed the code for "dump models to disk", so the PR will be smaller and easier to review. When this PR merged, I will create follow-up PR for "dump

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19208 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #18748: [SPARK-20679][ML] Support recommending for a subs...

2017-09-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18748#discussion_r139161851 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -356,6 +371,40 @@ class ALSModel private[ml

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-04-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 @nicodri Hi, I am modifying this PR and will commit this week! Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @yanboliang thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Thanks! I have merged your updates and fix mima file conflicts. @yanboliang has just come back from trip and will help review and merge it into 2.2 so don't worry about it! -

[GitHub] spark pull request #17706: fix MLOR coeffs centering when reg == 0

2017-04-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/17706 fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get

[GitHub] spark issue #17706: [SPARK-20423][ML] fix MLOR coeffs centering when reg == ...

2017-04-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17706 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r113867765 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1231,6 +1295,109 @@ class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 `update v7` fix previous `LogisticRegressionSuite` conflicts and `fix nits` commit for some nits update. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r114043519 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1231,6 +1295,109 @@ class

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @jkbradley @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128645326 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -337,8 +353,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128648171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends MLReadable

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley I think it's OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128684617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128815421 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends MLReadable

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r12283 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r129122995 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r129173570 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,799 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r129413407 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r129652515 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/9183 @yanboliang I will take over this feature and create a new PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129698281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129697890 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129697649 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private( private

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @yanboliang @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r130015435 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130518773 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130518147 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130520335 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130522066 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130521964 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130523538 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130519716 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -375,6 +375,18 @@ def copy(self, extra=None): that._defaultParamMap

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130521214 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18797: [SPARK-21523] update breeze to 0.13.1 for an emer...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18797 [SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18798 [SPARK-19634][ML] Multivariate summarizer - dataframes API ## What changes were proposed in this pull request? This patch adds the DataFrames API to the multivariate summarizer (mean

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130683584 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130682301 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130680940 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684135 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684686 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -0,0 +1,619 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130683266 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 performance data attached. cc @thunterdb @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130745275 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @thunterdb 1) The dataframe deserialize from binary data will add overhead, (maybe there is compaction or not, it depends on the datatype, cc @liancheng ) about 1x performance in my test

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746893 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130747756 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18313 @jkbradley I think the thing is simple. When persist model list param is `false`, just keep the code logic the same and **it won't increase the memory cost** (This is the default

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130940500 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 Strange thing, the code failed this `require` at https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/StrongWolfe.scala#L73 in the three case

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Yeah, the third case is another problem (I think we can simply change the iter num 7 to 6 in testcase) I am curious about the first two cases, why trigger the require fail ? By

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17849 Thanks your work on this but I am curious what is the benefit of doing this? In pyspark there is no param in Model itself currently, what is the problem or bugs it can resolve after adding

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 Thanks! Waiting AFT testcode author to figure out how to modify the testcase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Great! thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18798: [WIP] [SPARK-19634][ML] Multivariate summarizer -...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r131735748 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -547,35 +533,11 @@ object SummaryBuilderImpl extends Logging

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131762320 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131764683 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131763824 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131767248 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131766119 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131766525 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131771002 --- Diff: python/pyspark/ml/tests.py --- @@ -1158,6 +1165,33 @@ def test_decisiontree_regressor(self): except OSError

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131772274 --- Diff: python/pyspark/ml/util.py --- @@ -61,33 +66,86 @@ def _randomUID(cls): @inherit_doc -class MLWriter(object): +class

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131790673 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private( private

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131824713 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -82,6 +83,23 @@ class

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r131979173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131992917 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -83,6 +83,36 @@ class

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r132021227 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -107,9 +103,9 @@ class

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132052106 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

<    6   7   8   9   10   11   12   >