Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114043558
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/17556
For a (train) sample of continuous series, say {x0, x1, x2, x3, ..., x100}.
Now spark select quantile as split point.
Suppose 10-quantiles is used, and x2 is 1st quantile, and x10 is 2nd
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/17556
By the way, it's safe to use mean value as it is match the other libraries.
If requested, I'd like to modify the PR.
---
If your project is set up for it, you can reply to this email and
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/17556
OK, weight has been removed when calculating.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114238091
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114239173
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1009,10 +1009,17 @@ private[spark] object RandomForest extends
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114288834
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1044,10 @@ private[spark] object RandomForest extends
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/17556
How about testing the pr, @SparkQA
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114493650
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1042,8 @@ private[spark] object RandomForest extends Logging
GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/18736
[SPARK-21481][ML] Add indexOf method for ml.feature.HashingTF
## What changes were proposed in this pull request?
Add indexOf method for ml.feature.HashingTF.
The PR is a hotfix by
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18554
ping @holdenk @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18554#discussion_r129562189
--- Diff: python/pyspark/ml/classification.py ---
@@ -1517,20 +1517,22 @@ class OneVsRest(Estimator, OneVsRestParams,
MLReadable, MLWritable
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18554#discussion_r129562237
--- Diff: python/pyspark/ml/tests.py ---
@@ -1255,6 +1255,24 @@ def test_output_columns(self):
output = model.transform(df
GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/18763
[SPARK-21306][ML] OneVsRest should support setWeightCol for branch-2.1
The PR is related to #18554, and is modified for branch 2.1.
## What changes were proposed in this pull request
GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/18764
[SPARK-21306][ML] For branch 2.0, OneVsRest should support setWeightCol
The PR is related to #18554, and is modified for branch 2.0.
## What changes were proposed in this pull request
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18764#discussion_r130200288
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -33,6 +33,7 @@ import
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18764#discussion_r130200379
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -143,6 +144,16 @@ class OneVsRestSuite extends SparkFunSuite
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18763#discussion_r130200461
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -157,6 +157,16 @@ class OneVsRestSuite extends SparkFunSuite
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18763#discussion_r130202540
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18763#discussion_r130213337
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Thanks, @yanboliang . Could you give a hand, @srowen ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Test failures in pyspark.ml.tests with python2.6, but I don't have the
environment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Test failures in pyspark.ml.tests with python2.6, but I don't have the
environment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
@yanboliang Thanks, yanbo. I am not familar with python 2.6, which is too
outdated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18764#discussion_r131529693
--- Diff: python/pyspark/ml/classification.py ---
@@ -1344,7 +1346,19 @@ def _fit(self, dataset):
numClasses = int(dataset.agg({labelCol
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18763#discussion_r131529768
--- Diff: python/pyspark/ml/classification.py ---
@@ -1423,7 +1425,18 @@ def _fit(self, dataset):
numClasses = int(dataset.agg({labelCol
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
@SparkQA Take a test, please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Thanks, @yanboliang @gatorsmile
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user facaiy closed the pull request at:
https://github.com/apache/spark/pull/18764
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18764
Sure, thanks, @yanboliang !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user facaiy closed the pull request at:
https://github.com/apache/spark/pull/18763
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18763
Thanks, all.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r132131171
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
---
@@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18736#discussion_r132618802
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
---
@@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18736
@yanboliang Hi, yangbo. Could you help review the PR? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18736
Sure, @yanboliang . Thanks for your suggestion. I'll work on it later,
perhaps next week. Is it OK?
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18288#discussion_r122908140
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18288#discussion_r122909919
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18288
In my opinion, `numFeatures` is vital for sparse data.
Say our feature is 100-dim indeed, while in a small train data their
maximum size is 990. It is dangerous (or wrong) to train a 990
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18288#discussion_r123474003
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18288
You might be mistaken. The aim of code here is to encourage user to specify
`numFeatures` in any case, rather than encourage user to use only one file.
---
If your project is set up for it, you can
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18288
Yes.
an example code:
```scala
val df = spark.read.format("libsvm")
.option("numFeatures", "780")
.load("data/mllib/sample_libsvm_data.
GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/18058
[SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark
FPGrowth.
## What changes were proposed in this pull request?
Expose numPartitions (expert) param of PySpark
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18058
Thanks, @yanboliang.
Do you have any suggestion about testing the parameter?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18058
There seems something wrong with CI. I saw the same non-response/delay of
CI once again since last month.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18058#discussion_r118416391
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
return self.getOrDefault(self.minSupport
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18058#discussion_r118416400
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
return self.getOrDefault(self.minSupport
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18058#discussion_r118416434
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
return self.getOrDefault(self.minSupport
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18058
Hi, I'm not familiar with pyspark. I just wonder whether is it needed to
create a unit test for verification. If yes, how to check it? Thanks.
---
If your project is set up for it, you can rep
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18058
Resolved.
By the way,
Which one is preferable, rebase or merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/18120
[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in
PySpark
## What changes were proposed in this pull request?
add `getMaxDepth` method for ensemble tree models
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18120
@keypointt Hi, could you help check the pr is consistent with your #17207 ?
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18120
Hi, @keypointt . It's the feature of Python. The doctest is both document
and unit test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/18120
Thanks, @BryanCutler.
It seems that #17849 copys `Params` from `Estimator` to `Model`
automatically, which is pretty useful. However, `getter` method is still
missing and need to be added
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18139#discussion_r119263608
--- Diff: python/pyspark/sql/types.py ---
@@ -187,8 +187,11 @@ def needConversion(self):
def toInternal(self, dt):
if dt is not
Github user facaiy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18139#discussion_r119346663
--- Diff: python/pyspark/sql/types.py ---
@@ -187,8 +187,11 @@ def needConversion(self):
def toInternal(self, dt):
if dt is not
101 - 157 of 157 matches
Mail list logo