Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20088
Currently I cannot construct a failed test for this issue, but the future
PR (changing `RoundRobinPartitioning`) by @jiangxb1987 will trigger this bug
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/20088
[SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel save implementation
## What changes were proposed in this pull request?
Currently, in `ChiSqSelectorModel`, save
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r158692700
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/20077
[SPARK-22899][ML][STREAM] Fix OneVsRestModel transform on streaming data
failed.
## What changes were proposed in this pull request?
Fix OneVsRestModel transform on streaming data
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19994
LGTM.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19979
@MrBago @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19950
@cloud-fan Does it works like: If A and B are any class which is
registered, then Type Tuple2[A, B] will be automatically registered for kyro
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19950
And, these items added cannot cover the case in `MultilayerPeceptron`. Look
at `FeedForwardTrainer.train`, the persisted stacked `trainData`, the format is
`RDD[(Double, mllib.Vector)]`. The
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19950#discussion_r157922929
--- Diff:
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -187,14 +187,18 @@ class KryoSerializer(conf: SparkConf
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19156
Jenkins retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19988
I think we can discuss the following cases:
- When gradient non-zero, line-search failed, will the model always be
meaning-less ?
- When gradient nearly zero, and line-search failed. I
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r157668450
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19988
@srowen Wait... @jkbradley seems to have more thoughts about this:
Question:
When line search failed, does it mean the model is always meaning-less ?
Maybe we need more discussion
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19350
Design changed. I will create new PR for this later. New design is here
https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/19350
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19857
The design of this issue changed. @MrBago will take this over.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/19857
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@felixcheung Another failed testcase, spark.mlp in sparkR, it also use
`RFormula` and it will also generate indeterministic result, see class
`MultilayerPerceptronClassifierWrapper` line 78
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19988#discussion_r157441600
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157393913
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19988#discussion_r157393049
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19994#discussion_r157391859
--- Diff: python/pyspark/ml/regression.py ---
@@ -155,6 +183,14 @@ def intercept(self):
"""
return self._call_
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19994#discussion_r157391801
--- Diff: python/pyspark/ml/tests.py ---
@@ -1725,6 +1725,27 @@ def test_offset(self):
self.assertTrue(np.isclose(model.intercept
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19156
Jenkins retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19621#discussion_r157169491
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -79,20 +80,49 @@ private[feature] trait StringIndexerBase
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19350#discussion_r157118663
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] exte
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
I discussed with @MrBago offline, I make a summary for what I thought now:
I give 3 approaches which we can compare, after discussion I realized none
of them is ideal, we have to make a
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19979
[SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsuite add
StructuredStreaming test
## What changes were proposed in this pull request?
ML regression testsuite add
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
@MrBago
Your code https://github.com/apache/spark/pull/19904#discussion_r156751569
also works fine, I think. Although it is more complicated.
@BryanCutler
>the unpersist
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19156#discussion_r156629043
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -197,14 +240,14 @@ private[ml] object SummaryBuilderImpl extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19350#discussion_r156603289
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] exte
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r156550777
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r156375242
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18390#discussion_r156292467
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
---
@@ -38,17 +38,39 @@ class
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18390#discussion_r156295173
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
---
@@ -80,17 +102,42 @@ class
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@felixcheung "iris" is a built-in dataset in R, used in many algo testing,
so is it proper to change it ?
---
--
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156268308
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156257397
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19927#discussion_r156249173
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -156,54 +153,22 @@ final class OneVsRestModel private[ml
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19715#discussion_r156010806
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -129,34 +156,102 @@ final class QuantileDiscretizer @Since
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19927#discussion_r155996272
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -156,54 +153,22 @@ final class OneVsRestModel private[ml
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
@sethah To verify the memory issue, you can add one line test code against
current master at here:
```
val modelFutures = ...
// Unpersist training data only when
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r155715665
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r155695280
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,31 +146,34 @@ class CrossValidator @Since("1.2.0"
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19843
add UT for MLTest and change to use PipelineModel.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155403743
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@felixcheung Yes, the spark.mlp test result changed because of indexer
order changed. That's because, StringIndexer when item frequency equal, there's
no definite rule for index orde
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19889
LGTM.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18581#discussion_r155141609
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18581#discussion_r155141551
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
@BryanCutler @MLnick @MrBago @hhbyyh
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19904
[SPARK-22707][ML] Optimize CrossValidator fitting memory occupation by
models
## What changes were proposed in this pull request?
Via some test I found CrossValidator still exists
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@felixcheung There is no breaking change. But, we meet some trouble thing
about indeterministic behavior. When frequency equal, the indexer result is
indeterministic. I already fix those in
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r154820556
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18581#discussion_r154594735
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18581#discussion_r154598540
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18581#discussion_r154594381
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19627
@jkbradley I think it is better to review #19857 (fix python model specific
optimization) and merge it first and then I rebase & update thi
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
Any one can provide some suggestion ? for fixing sparkR glm test failure
here.
---
-
To unsubscribe, e-mail: reviews
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19758#discussion_r154302624
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeSplitUtilsSuite.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19857
@MrBago @jkbradley I think this PR need to be reviewed and merged first,
before reviewing #19627
Because this PR change some critical code path
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19857
[SPARK-22667][ML] Fix model-specific optimization support for ML tuning:
Python API
## What changes were proposed in this pull request?
Python CrossValidator/TrainValidationSplit
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19843
@MrBago Thanks!
I update code, now new action class `CheckAnswerRowsByFunc` is added. I do
not add common trait as both of them are simple and I don't want to break old
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19843
@MrBago @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19843
Jenkins retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19843
[SPARK-22644][ML][TEST][WIP] Make ML testsuite support StructuredStreaming
test
## What changes were proposed in this pull request?
We need to add some helper code to make testing ML
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19746
What about supporting multiple columns ? VectorAssembler require multiple
input columns, they all need VectorSizeHint to transform first. There's no need
to use multiple VectorSiz
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r153420330
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r153420429
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
I checked the failed tests in sparkR. There's some trouble in the failed
`glm` sparkR tests.
These tests compare sparkR glm and R-lib glm results on test data "iris",
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19758#discussion_r152749515
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeSplitUtilsSuite.scala ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@viirya @MLnick Code updated. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@MLnick Ah, I don't express it exactly, the first case, what I mean is,
sort by frequency, but if the case frequency equal, sort by alp
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@MLnick How about this way:
The case "fequencyAsc/Desc", sort first by frequency and then by alphabet,
The case "alphabetAsc/Desc", sort by alphabet (and if alphab
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@MLnick Will RDD "count by value" aggregation be deterministic ? e.g., 2
RDD with the same elements, but with different element order and different
partition number, will `rdd.co
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
Jenkins retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r152454669
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19753
Thanks @holdenk
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19627
@holdenk Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19621#discussion_r152252491
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -130,21 +160,49 @@ class StringIndexer @Since("
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19621#discussion_r152252553
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -217,69 +289,94 @@ class StringIndexerModel (
@Since
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19621#discussion_r152252129
--- Diff: project/MimaExcludes.scala ---
@@ -82,7 +82,15 @@ object MimaExcludes {
// [SPARK-21087] CrossValidator, TrainValidationSplit
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r152166365
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r152158938
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151958073
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151956715
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151956112
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151954037
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151955287
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r151954590
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19627
@holdenk Find the reason. There is an empty file in the directory. :)
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19627
My local test passed. This test failure looks like test system issue.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19753#discussion_r151311832
--- Diff: python/pyspark/ml/feature.py ---
@@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol,
HasOutputCol, JavaMLReadable, Ja
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19753#discussion_r151311569
--- Diff: python/pyspark/ml/feature.py ---
@@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol,
HasOutputCol, JavaMLReadable, Ja
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19753
@smurching The getter/setter is included in the super class
`HasHandleInvalid`. I can add test for it.
---
-
To
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
I want to ask, for option `StringIndexer.frequencyDesc`, in the case
existing two labels which have the same frequency, which of them will be put in
the front ?
If this is not specified
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r151055221
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19753
[SPARK-22521][ML] VectorIndexerModel support handle unseen categories via
handleInvalid: Python API
## What changes were proposed in this pull request?
Add python api for
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19588
Python API jira created here:
https://issues.apache.org/jira/browse/SPARK-22521
---
-
To unsubscribe, e-mail: reviews
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
@viirya @MLnick Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
301 - 400 of 1170 matches
Mail list logo