[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21119 @huaxingao Create a new PR is better I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194167552 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1159,216 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214516 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214535 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214831 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194215008 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214431 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark issue #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21513 LGTM. Thanks! @mengxr Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expec...

2018-03-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20852#discussion_r175380424 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -119,9 +119,15 @@ trait MLTest extends StreamTest with TempDirectory

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r175970711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,73 @@ private[ml] object Node { /** * Create a new

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r176009765 --- Diff: python/pyspark/ml/stat.py --- @@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176039913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1076,6 +1076,16 @@ class SessionCatalog

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176039540 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -175,6 +175,8 @@ private[sql] class HiveSessionCatalog

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20795 And I don't think it need to split into builtin and external function exist check in this case. Just following code works fine: ``` object LookupFunctions extends Rule[Logica

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20795 Yea, I understand the reason to split built-in and external because you only want to cache external function name. But cache all used function names in a query do not cost too much so that

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176299569 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -175,6 +175,8 @@ private[sql] class HiveSessionCatalog

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176631255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-03-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20904 [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Python API in pyspark.ml ## What changes were proposed in this pull request? Kolmogorov-Smirnoff test Python API in `pyspark.ml

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-29 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20934 [SPARK-23818][SQL][WIP] an official UDF interface for Spark SQL ## What changes were proposed in this pull request? API: (to be discussed), use 2-args as example

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20934#discussion_r178217425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -217,6 +217,27 @@ class UDFRegistration private[sql

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20934#discussion_r178446367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/JavaUDF.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20934 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20934: [SPARK-23818][SQL][WIP] an official UDF interface for Sp...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20934 Will be open again when interface decision made for this. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

2018-04-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20313#discussion_r178517391 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -264,7 +265,9 @@ class CountVectorizerModel

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178784391 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NGramSuite.scala --- @@ -84,7 +84,7 @@ class NGramSuite extends MLTest with

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178783980 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala --- @@ -167,4 +166,20 @@ class MinHashLSHSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178784053 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala --- @@ -48,8 +46,8 @@ class MinMaxScalerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178778285 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala --- @@ -76,6 +75,28 @@ class ImputerSuite extends SparkFunSuite with

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178780101 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MaxAbsScalerSuite.scala --- @@ -45,9 +44,9 @@ class MaxAbsScalerSuite extends SparkFunSuite

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 @jkbradley Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20973 [SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? PrefixSpan API for spark.ml. New implementation

[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20810 According to @jkbradley 's opinion. I create a new PR which only use a static method. --- - To unsubscribe, e

[GitHub] spark pull request #20810: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20810 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20837 No problem. I will take over this. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179311446 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,63 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark pull request #20982: [SPARK-23859][ML] Initial PR for Instrumentation ...

2018-04-05 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20982 [SPARK-23859][ML] Initial PR for Instrumentation improvements: UUID and logging levels ## What changes were proposed in this pull request? Initial PR for Instrumentation improvements

[GitHub] spark issue #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.

2018-04-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20994 LGTM. Thanks! cc @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-04-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20319 @smurakozi Thanks for the PR! Could you resolve conflicts first? and then I will make a review. If you're busy I can also take ov

[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20235#discussion_r180027926 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with

[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20904 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20964 LGTM. 👍 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 Because of codebase changing, I will create new PR to replace this one. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValid...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19627 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValid...

2018-04-10 Thread WeichenXu123
GitHub user WeichenXu123 reopened a pull request: https://github.com/apache/spark/pull/19627 [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API ## What changes were proposed in this pull request? CrossValidator

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @MrBago @yogeshg @jkbradley Updated and ready for review now! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 If you're busy I can help take over this. -:) --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17092#discussion_r180998595 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala --- @@ -137,6 +136,9 @@ class

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17092#discussion_r180999421 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -119,6 +118,9 @@ class MinHashLSH(override val uid: String) extends

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r181015190 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala --- @@ -192,12 +192,12 @@ abstract class ClassificationModel

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181015525 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181018223 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #21051: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

2018-04-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21051 [SPARK-23751][FOLLOW-UP] fix build for scala-2.12 ## What changes were proposed in this pull request? fix build for scala-2.12 ## How was this patch tested? Manual

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181270142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181287383 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181286908 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-16 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21078 [SPARK-23990][ML] Instruments logging improvements - ML regression package ## What changes were proposed in this pull request? Instruments logging improvements - ML regression package

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21078 @MrBago @jkbradley Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 @dbtsai Good idea! Is there a related JIRA or could you open one for it ? cc @jkbradley --- - To unsubscribe, e-mail

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182003965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -75,11 +80,16 @@ class

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182002432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -67,6 +68,10 @@ class

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182004759 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,11 @@ import

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182367186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,11 @@ import

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-18 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21097 [SPARK-14682][ML] Provide evaluateEachIteration method or equivalent for spark.ml GBTs ## What changes were proposed in this pull request? Provide evaluateEachIteration method or

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161857103 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java --- @@ -35,41 +34,37 @@ import

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161854406 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161859425 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark issue #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator document an...

2018-01-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20257 Nice, LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -171,23 +176,23 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17123 But, pls resolve conflicts first. :) Bucketizer add multiple column support so the code is different now. --- - To

[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20324 LGTM. Thanks! 👍 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-23 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163338180 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19993 +1 merge this to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegre...

2018-01-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20411 [SPARK-17139][ML][FOLLOW-UP] update LogisticRegressionSummaryExample code ## What changes were proposed in this pull request? New method `trainingSummary.asBinary` added so in this

[GitHub] spark issue #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegressionSu...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20411 @sethah ok thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegre...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20411 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164237753 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164531329 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-01-30 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20446 [SPARK-23254][ML] Add user guide entry for DataFrame multivariate summary ## What changes were proposed in this pull request? Add user guide and scala/java examples for

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry for DataFrame mul...

2018-01-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20446 @MLnick @MrBago Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20421 @MLnick Forget one fix: https://github.com/apache/spark/pull/18797 I doubt whether this fix should go into "behavior change". It influences iteration number for algos

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20457 [SPARK-23110][MINOR] Make linearRegressionModel constructor private ## What changes were proposed in this pull request? make linearRegressionModel constructor private[ml

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20457 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20421 ah, yes, it backport to 2.2 😳 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20459#discussion_r165229102 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -93,7 +93,7 @@ private[feature] trait

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
GitHub user WeichenXu123 reopened a pull request: https://github.com/apache/spark/pull/20457 [SPARK-23110][MINOR] Make linearRegressionModel constructor private ## What changes were proposed in this pull request? make linearRegressionModel constructor private[ml

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20457 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20457: [SPARK-23110][MINOR] Make linearRegressionModel construc...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20457 It's covered in this PR #20459 So go there discuss. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.or

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165565121 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165573866 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165578020 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20164 Sorry, I haven't understood where is the issue in current master code. The models here should be `ClassificationModel` and will always have `rawPrediction` param and have default

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20164 Oh, do you mean if input df including a column named "rawPrediction", then it will be overwritten when it transformed by OVSModel ? Looks like

<    1   2   3   4   5   6   7   8   9   10   >