[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114043558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 For a (train) sample of continuous series, say {x0, x1, x2, x3, ..., x100}. Now spark select quantile as split point. Suppose 10-quantiles is used, and x2 is 1st quantile, and x10 is 2nd

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 By the way, it's safe to use mean value as it is match the other libraries. If requested, I'd like to modify the PR. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-30 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 OK, weight has been removed when calculating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use midpoints for split valu...

2017-05-01 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114238091 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use midpoints for split valu...

2017-05-01 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114239173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,17 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use midpoints for split valu...

2017-05-02 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114288834 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1044,10 @@ private[spark] object RandomForest extends

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use midpoints for split values.

2017-05-02 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 How about testing the pr, @SparkQA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use midpoints for split valu...

2017-05-03 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114493650 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1042,8 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-07-25 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18736 [SPARK-21481][ML] Add indexOf method for ml.feature.HashingTF ## What changes were proposed in this pull request? Add indexOf method for ml.feature.HashingTF. The PR is a hotfix by

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-25 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 ping @holdenk @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-26 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r129562189 --- Diff: python/pyspark/ml/classification.py --- @@ -1517,20 +1517,22 @@ class OneVsRest(Estimator, OneVsRestParams, MLReadable, MLWritable

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-26 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r129562237 --- Diff: python/pyspark/ml/tests.py --- @@ -1255,6 +1255,24 @@ def test_output_columns(self): output = model.transform(df

[GitHub] spark pull request #18763: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-28 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18763 [SPARK-21306][ML] OneVsRest should support setWeightCol for branch-2.1 The PR is related to #18554, and is modified for branch 2.1. ## What changes were proposed in this pull request

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18764 [SPARK-21306][ML] For branch 2.0, OneVsRest should support setWeightCol The PR is related to #18554, and is modified for branch 2.0. ## What changes were proposed in this pull request

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r130200288 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -33,6 +33,7 @@ import

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r130200379 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -143,6 +144,16 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch-2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130200461 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -157,6 +157,16 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130202540 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130213337 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-07-31 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Thanks, @yanboliang . Could you give a hand, @srowen ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-01 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-03 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Test failures in pyspark.ml.tests with python2.6, but I don't have the environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-04 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Test failures in pyspark.ml.tests with python2.6, but I don't have the environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-04 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 @yanboliang Thanks, yanbo. I am not familar with python 2.6, which is too outdated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-08-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r131529693 --- Diff: python/pyspark/ml/classification.py --- @@ -1344,7 +1346,19 @@ def _fit(self, dataset): numClasses = int(dataset.agg({labelCol

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-08-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r131529768 --- Diff: python/pyspark/ml/classification.py --- @@ -1423,7 +1425,18 @@ def _fit(self, dataset): numClasses = int(dataset.agg({labelCol

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-06 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 @SparkQA Take a test, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-07 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Thanks, @yanboliang @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-08-08 Thread facaiy
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/18764 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-08 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Sure, thanks, @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-08-08 Thread facaiy
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/18763 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest should suppo...

2017-08-08 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18763 Thanks, all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-09 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132131171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-10 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132618802 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...

2017-08-13 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 @yanboliang Hi, yangbo. Could you help review the PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...

2017-08-15 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 Sure, @yanboliang . Thanks for your suggestion. I'll work on it later, perhaps next week. Is it OK? --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-20 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18288#discussion_r122908140 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-20 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18288#discussion_r122909919 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark issue #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18288 In my opinion, `numFeatures` is vital for sparse data. Say our feature is 100-dim indeed, while in a small train data their maximum size is 990. It is dangerous (or wrong) to train a 990

[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-22 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18288#discussion_r123474003 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark issue #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18288 You might be mistaken. The aim of code here is to encourage user to specify `numFeatures` in any case, rather than encourage user to use only one file. --- If your project is set up for it, you can

[GitHub] spark issue #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18288 Yes. an example code: ```scala val df = spark.read.format("libsvm") .option("numFeatures", "780") .load("data/mllib/sample_libsvm_data.

[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-22 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18058 [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 Thanks, @yanboliang. Do you have any suggestion about testing the parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 There seems something wrong with CI. I saw the same non-response/delay of CI once again since last month. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416391 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport

[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416400 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport

[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18058#discussion_r118416434 --- Diff: python/pyspark/ml/fpm.py --- @@ -49,6 +49,32 @@ def getMinSupport(self): return self.getOrDefault(self.minSupport

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 Hi, I'm not familiar with pyspark. I just wonder whether is it needed to create a unit test for verification. If yes, how to check it? Thanks. --- If your project is set up for it, you can rep

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18058 Resolved. By the way, Which one is preferable, rebase or merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for...

2017-05-26 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18120 [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark ## What changes were proposed in this pull request? add `getMaxDepth` method for ensemble tree models

[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

2017-05-26 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18120 @keypointt Hi, could you help check the pr is consistent with your #17207 ? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

2017-05-27 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18120 Hi, @keypointt . It's the feature of Python. The doctest is both document and unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

2017-05-30 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18120 Thanks, @BryanCutler. It seems that #17849 copys `Params` from `Estimator` to `Model` automatically, which is pretty useful. However, `getter` method is still missing and need to be added

[GitHub] spark pull request #18139: [SPARK-20787][PYTHON] PySpark can't handle dateti...

2017-05-30 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18139#discussion_r119263608 --- Diff: python/pyspark/sql/types.py --- @@ -187,8 +187,11 @@ def needConversion(self): def toInternal(self, dt): if dt is not

[GitHub] spark pull request #18139: [SPARK-20787][PYTHON] PySpark can't handle dateti...

2017-05-31 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18139#discussion_r119346663 --- Diff: python/pyspark/sql/types.py --- @@ -187,8 +187,11 @@ def needConversion(self): def toInternal(self, dt): if dt is not

<    1   2