[GitHub] spark pull request #16574: [SPARK-19189] Optimize CartesianRDD to avoid pare...

2017-01-15 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/16574 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16574: [SPARK-19189] Optimize CartesianRDD to avoid parent RDD'...

2017-01-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16574 I need to make a survey for better Cartesian implementation, especially in shuffle way. Close this PR for now and when the new solution is done I will reopen it. --- If your project is set

[GitHub] spark issue #16576: [SPARK-19215] Add necessary check for `RDD.checkpoint` t...

2017-01-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16576 @mridulm code updated. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2017-01-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15730 @brkyvz I update code and attach a running result screenshot, waiting for your review, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-01-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Oh, thanks very much! I will update code immediately. As I commit this PR I will work on it in the first priority. Happy new year! --- If your project is set up for it, you can reply

[GitHub] spark pull request #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data siz...

2017-01-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15730#discussion_r96131333 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -459,14 +464,155 @@ class BlockMatrix @Since("

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-03-21 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/17373 [SPARK-12664] Expose probability in mlp model ## What changes were proposed in this pull request? Modify MLP model to inherit `ProbabilisticClassificationModel` and so that it can

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Don't worry, I will update code ASAP and @yanboliang will also help review it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r107326057 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1786,51 +1793,98 @@ class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Code updated. cc @yanboliang Pls help boost this PR thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r107325971 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -194,15 +207,9 @@ class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Thanks! I have merged your updates and fix mima file conflicts. @yanboliang has just come back from trip and will help review and merge it into 2.2 so don't worry about

[GitHub] spark issue #17706: [SPARK-20423][ML] fix MLOR coeffs centering when reg == ...

2017-04-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17706 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-04-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 @nicodri Hi, I am modifying this PR and will commit this week! Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17706: fix MLOR coeffs centering when reg == 0

2017-04-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/17706 fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @yanboliang thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Done. cc @sethah @jkbradley thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 jenkins, test please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah OK no problem! I can move the method implementations into trait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah I think `truePositiveRate` is equivalent to `recall` so I directly implement another one in the trait, but if you don't like this way I can modify it. Is there anything else need

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Yeah, the third case is another problem (I think we can simply change the iter num 7 to 6 in testcase) I am curious about the first two cases, why trigger the require fail

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18313 @jkbradley I think the thing is simple. When persist model list param is `false`, just keep the code logic the same and **it won't increase the memory cost** (This is the default case

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17849 Thanks your work on this but I am curious what is the benefit of doing this? In pyspark there is no param in Model itself currently, what is the problem or bugs it can resolve after adding

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130745275 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746893 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @thunterdb 1) The dataframe deserialize from binary data will add overhead, (maybe there is compaction or not, it depends on the datatype, cc @liancheng ) about 1x performance in my test

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 performance data attached. cc @thunterdb @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130747756 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 Thanks! Waiting AFT testcode author to figure out how to modify the testcase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @MLnick That's because, this bug will be triggered only when we standardize feature first then do training... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @MLnick Yes it is always trained in scaled space. But the testcase you mentioned do not take the "scale" step, so do not trigger the bug... --- If your project is set up for it, you

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @yanboliang I will update ASAP, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133121659 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133119397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133081995 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133081682 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133082255 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r133080543 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r133080201 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala --- @@ -69,6 +69,20 @@ class HashingTFSuite extends SparkFunSuite

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @viirya Sure! comment updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @jkbradley Code updated, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133191237 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133636062 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -882,21 +882,28 @@ class LogisticRegression @Since

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18281 @ajaysaini725 @jkbradley Can we avoid python-side to re-implement the logic of OneVsRest? It can simply python-side code I think. Just let the wrapper inherit `JavaEstimator`, and when we

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 @LeoIV sorry for delay! I will update code soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 The probability should always between 0 and 1 Send me your test code and test data to help me find out where is wrong. In my own test the result is ok. Sent from my iPhone

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 RawPrediction is not probability It's range is from -inf to inf Softmax(raw predictions) get probabilities It's range is from 0 to 1 Thanks! Sent from my iPhone

[GitHub] spark issue #16571: [SPARK-19208][ML][WIP] MaxAbsScaler and MinMaxScaler are...

2017-07-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16571 This PR is very similar to my early PR. Is that right? @jkbradley #14950 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 They are, I think some values are something like 4.7532244532E-10 the display truncate them. Thanks Sent from my iPhone On 15 Jul 2017, at 12:35 AM, Leonard Hövelmann

[GitHub] spark issue #17419: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17419 As the dataframe version is much slower than RDD version (currently test against vector of size 1) I also guess there is some performance issue in `ObjectAggregationIterator.processInput

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r128429389 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -0,0 +1,406 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r128428434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,746 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r128429254 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,799 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r128428604 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,746 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128648171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends MLReadable

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley I think it's OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128684617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128645326 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -337,8 +353,13 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128815421 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends MLReadable

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r12283 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r129413407 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r129122995 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-07-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r129173570 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,799 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r130015435 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r129652515 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/9183 @yanboliang I will take over this feature and create a new PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129698281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129697890 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +544,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r129697649 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private( private

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @yanboliang @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @jkbradley @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r114043519 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1231,6 +1295,109 @@ class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 `update v7` fix previous `LogisticRegressionSuite` conflicts and `fix nits` commit for some nits update. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-04-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r113867765 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1231,6 +1295,109 @@ class

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Great! thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132052106 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132058692 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132058898 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley @MrBago thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131824713 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -82,6 +83,23 @@ class

[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r132069046 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class

[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17894 I am also interested in implementation by level-3 BLAS. Can you post a design doc first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r132068663 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class

[GitHub] spark pull request #18888: [Spark-17025][ML][Python] Persistence for Pipelin...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r132072527 --- Diff: python/pyspark/ml/pipeline.py --- @@ -242,3 +327,65 @@ def _to_java(self): JavaParams._new_java_obj

[GitHub] spark pull request #18888: [Spark-17025][ML][Python] Persistence for Pipelin...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r132070100 --- Diff: python/pyspark/ml/pipeline.py --- @@ -204,13 +282,20 @@ def copy(self, extra=None): @since("2.0.0") def

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16774 @BryanCutler You are right. Once `Future` complete the model can be cleaned by GC. So the memory cost of the code has been optimized already. I didn't look at the code carefully a few days ago

[GitHub] spark pull request #18797: [SPARK-21523] update breeze to 0.13.1 for an emer...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18797 [SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130518773 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130518147 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130520335 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130522066 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130521964 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130523538 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130519716 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -375,6 +375,18 @@ def copy(self, extra=None): that._defaultParamMap

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130521214 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18798 [SPARK-19634][ML] Multivariate summarizer - dataframes API ## What changes were proposed in this pull request? This patch adds the DataFrames API to the multivariate summarizer (mean

<    1   2   3   4   5   6   7   8   9   10   >