[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Your idea looks also reasonable. So we can use the condition "exclude the first bin" to do the pruning (filter out the other half symmetric splits). This condition looks si

[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149956415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -527,27 +570,28 @@ private[ml] object SummaryBuilderImpl extends

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15770 LGTM. ping @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18624#discussion_r150170451 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -286,40 +288,119 @@ object

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18624 But, I agree the issue @MLnick mentioned, the code now looks convoluted, can you try to simplify

[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-11-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149855295 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -197,14 +240,14 @@ private[ml] object SummaryBuilderImpl extends

[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-11-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149854985 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -94,46 +97,86 @@ object Summarizer extends Logging { * - min

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-11-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r148706148 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19641: [SPARK-21911][ML][FOLLOW-UP] Fix doc for parallel ML Tun...

2017-11-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19641 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19586: [SPARK-22367][WIP][CORE] Separate the serialization of c...

2017-11-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19586 We can config the class to register by config `spark.kryo.classesToRegister`, does it need to add into spark code

[GitHub] spark issue #19586: [SPARK-22367][WIP][CORE] Separate the serialization of c...

2017-11-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19586 and in `ml`, if we want to register class before running algos, Some other classes like `LabeledPoint`, `Instance` also need registered. and there're some class temporary defined in some

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-11-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r148926895 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 @sethah To verify the memory issue, you can add one line test code against current master at here: ``` val modelFutures = ... // Unpersist training data only when

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r155695280 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,31 +146,34 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r155715665 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154594381 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154594735 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154598540 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19843 add UT for MLTest and change to use PipelineModel. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator fitting memory...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 @BryanCutler @MLnick @MrBago @hhbyyh --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator fitting...

2017-12-05 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19904 [SPARK-22707][ML] Optimize CrossValidator fitting memory occupation by models ## What changes were proposed in this pull request? Via some test I found CrossValidator still exists

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r155141551 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r155141609 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #19927: [SPARK-22737][ML] OVR transform optimization

2017-12-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19927#discussion_r155996272 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -156,54 +153,22 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r156010806 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -129,34 +156,102 @@ final class QuantileDiscretizer @Since

[GitHub] spark issue #19889: [SPARK-22690][ML] Imputer inherit HasOutputCols

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19889 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung Yes, the spark.mlp test result changed because of indexer order changed. That's because, StringIndexer when item frequency equal, there's no definite rule for index order

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155403743 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19927: [SPARK-22737][ML] OVR transform optimization

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19927#discussion_r156249173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -156,54 +153,22 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r156375242 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung "iris" is a built-in dataset in R, used in many algo testing, so is it proper

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156257397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156268308 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18390: [SPARK-21178][ML] Add support for label specific ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18390#discussion_r156295173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -80,17 +102,42 @@ class

[GitHub] spark pull request #18390: [SPARK-21178][ML] Add support for label specific ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18390#discussion_r156292467 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -38,17 +38,39 @@ class

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @jkbradley I think it is better to review #19857 (fix python model specific optimization) and merge it first and then I rebase & update thi

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r147075121 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r147036693 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19433 > We'll actually only have to run an O(n log n) sort on continuous feature values once (i.e. in the FeatureVector constructor), since once the continuous features are sorted we can update t

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 @akopich If you want to cache the input dataset, create JIAR to discuss it first. It's another issue I think. This JIAR also related to input caching issues: https://issues.apache.org/jira

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 @akopich IMO the filter won't cost too much, don't worry about the performance. (Or you can make a test to make sure

[GitHub] spark issue #19558: [SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasio...

2017-10-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19558 cc @jkbradley @MrBago --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146735946 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r146799989 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -497,40 +495,38 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r146810442 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -497,40 +495,38 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #10466: [SPARK-12375] [ML] add handleinvalid for vectorindexer

2017-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/10466 @hhbyyh OK. i will take this over. Our team need this feature now. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19621: [SPARK-11215][ml] Add multiple columns support to...

2017-10-31 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19621 [SPARK-11215][ml] Add multiple columns support to StringIndexer ## What changes were proposed in this pull request? Add multiple columns support to StringIndexer. ## How

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-10-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r148047597 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-10-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 oh, not a big deal, what I thought is that, user is possible to use `graphx` package to get the `Graph[Double, Double]`, and in `ml` package it cannot accept this format, require

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-10-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19621#discussion_r148174902 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -130,21 +152,33 @@ class StringIndexer @Since("

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2018-05-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r186381507 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,52 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13493 LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21270: [SPARK-24213][ML]Power Iteration Clustering in SparkML t...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21270 @shahidki31 Seemingly what you said above is anothor issue ? You can create another jira for that. :) --- - To unsubscribe

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r186994754 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21272: [MINOR][ML][DOC] Improved Naive Bayes user guide explana...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21272 LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r186986006 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -232,7 +232,7 @@ class PowerIterationClustering

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-07 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21265 [SPARK-24146][PySpark][ML] spark.ml parity for sequential pattern mining - PrefixSpan: Python API ## What changes were proposed in this pull request? spark.ml parity for sequential

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620855 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620777 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184626842 --- Diff: python/pyspark/ml/util.py --- @@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata): """

[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21274 LGTM. ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21163 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17086 LGTM. @jkbradley @mengxr Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21129 Jenkins test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-05-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r186037589 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +365,20 @@ class GBTClassifierSuite extends

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-17 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21163 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188853310 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21393: [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for ...

2018-05-22 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21393 [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? Change `PrefixSpan` into a class

[GitHub] spark issue #21393: [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequent...

2018-05-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21393 @mengxr @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188491670 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21513 LGTM. Thanks! @mengxr Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194167552 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1159,216 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21119 @huaxingao Create a new PR is better I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214516 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214535 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214831 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194215008 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214431 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r191996249 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-30 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r191995667 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r192000596 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark issue #21265: [SPARK-24146][PySpark][ML] spark.ml parity for sequentia...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21265 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21493: [SPARK-15784] Add Power Iteration Clustering to s...

2018-06-04 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21493 [SPARK-15784] Add Power Iteration Clustering to spark.ml ## What changes were proposed in this pull request? According to the discussion on JIRA. I rewrite the Power Iteration

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20973 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20261 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21163: [SPARK-24097][ML] Instruments improvements - Rand...

2018-04-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21163 [SPARK-24097][ML] Instruments improvements - RandomForest and GradientBoostedTree ## What changes were proposed in this pull request? Instruments improvements for `RandomForest

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184584878 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -55,44 +60,128 @@ class MulticlassMetricsSuite

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17086 overall good, @jkbradley Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184566012 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-30 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r185149879 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -44,26 +43,37 @@ object PrefixSpan { * * @param dataset

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20261 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185970925 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -423,6 +423,8 @@ class GaussianMixture @Since("

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184343934 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184342231 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184346287 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184344901 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184345688 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184344777 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20973 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

<    4   5   6   7   8   9   10   11   12   >