Github user actuaryzhang closed the pull request at:
https://github.com/apache/spark/pull/17161
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
This is what we get from the current doc:
![image](https://cloud.githubusercontent.com/assets/11082368/26430799/dd49fa4c-40a4-11e7-95c6-66def9a8f588.png)
---
If your project is set
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12414#discussion_r118392116
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -129,6 +129,23 @@ class RFormulaSuite extends SparkFunSuite
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12414#discussion_r118391881
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -163,12 +163,20 @@ class RFormula @Since("1.5.0") (@Si
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
![image](https://cloud.githubusercontent.com/assets/11082368/26429381/64dd117e-409b-11e7-9661-659b5fbe8206.png)
![image](https://cloud.githubusercontent.com/assets/11082368/26429388
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung I just made a new commit which I think has the cleanest
solution so far. In this one, I implemented grouping for all aggregate
functions for Column, except those that are also
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
I tried using ``, but in Scaladoc, it is not
correctly formatted. I tried a few other options, but it seems the html
attributes are ignored in Scaladoc.
![image](https
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r117925930
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,42 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
Great point.
- For a method that is defined in one class and belongs in a group like
`cov`, we can document it in its own Rd, and add a link to in the `SeeAlso`
section of the group
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r117909338
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,42 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17864
Ping folks for comments/review. Many thanks.
@viirya @MLnick @jkbradley @hhbyyh @yanboliang @BenFradet
---
If your project is set up for it, you can reply to this email and have your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung I think we may want to distinguish a few cases:
1. For methods that are mainly defined by only one class, e.g., most
function methods for Column, it makes sense to group
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18051
Maybe I'm missing something completely, but I still don't get the point why
we are removing the `xx-method` link since we are defining methods as S4 using
`setMethod`. Lots of packages have
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r117892629
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,42 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@yanboliang I updated the example in the param doc. I hope it is clear now
that it is `alphabetDesc` that drops the same category as R. That is, RFormula
with `alphabetDesc` drops the first
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r117873500
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,42 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@felixcheung Is the html tag `` supported? Tried this but failed to
compile...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r117775336
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,42 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@yanboliang I understand your points. The issue is `OneHotEncoder` only
supports `dropLast`.
The ideal solution to match R exactly (both the category dropped and
ordering of feature
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@holdenk Thanks for the comment. Added default value in docstring.
@felixcheung Please let me know if there is anything else needed for this
PR.
Thanks everyone for the review
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@HyukjinKwon @felixcheung I confirm it works for Javadoc.
![image](https://cloud.githubusercontent.com/assets/11082368/26277962/21dbe70e-3d46-11e7-978f-e422b9122e87.png)
---
If your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung
I'm curious what is the motivation to remove the style `avg,Column-method`?
This is the default and preferred (I believe) way to reference to S4
methods in R.
Suppose
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@felixcheung @HyukjinKwon Thanks much for pointing out the documentation
issues.
I still prefer to have a table to clearly illustrate what each option is
doing.
Made a new commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@yanboliang Thanks for the review and suggestion. Makes lots of sense. I
made a new commit to address these.
---
If your project is set up for it, you can reply to this email and have your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@yanboliang Thanks for the question.
The alphabetically ascending order in R is very convenient for display
purpose. For example, when you do a summary of model results, the results
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16699
@yanboliang Thanks for the update.
This is a big change, although lots of then are on the test side.
Let me know if there is anything I can do to help make the review easier
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@felixcheung Would you help merge this? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung Thanks for asking this. I should have been more clear.
`@name` is the "name" of the Rd object represented, which is unique. An Rd
object can have multip
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18025#discussion_r117566353
--- Diff: R/pkg/R/generics.R ---
@@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) {
standardGeneric("agg") })
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18025#discussion_r117565367
--- Diff: R/pkg/R/generics.R ---
@@ -385,6 +385,7 @@ setGeneric("value", function(bcast) {
standardGeneric("value"
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@viirya Great point. Added a comment to explain this in the doc.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung Thanks for your feedback.
- This does not affect discoverability: the name of the method is still on
the index list
- No problem with help either, e.g., one can use `?avg
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
This is what the `'column_aggregate_functions.Rd'` doc looks like:
![image](https://cloud.githubusercontent.com/assets/11082368/26190195/fd353224-3b5c-11e7-9a78-2607cc665f49.png
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18025
@felixcheung @HyukjinKwon
Per this
[suggestion](https://github.com/apache/spark/pull/18003#discussion-diff-116853922L57),
I'm creating more meaningful examples for the SQL functions
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/18025
[WIP][SparkR] Update doc and examples for sql functions
## What changes were proposed in this pull request?
Create better examples for sql functions.
You can merge this pull
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18003
Thanks for the update. I will work on better examples on these functions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18003#discussion_r116888993
--- Diff: R/pkg/R/functions.R ---
@@ -54,7 +54,8 @@ setMethod("lit", signature("ANY"),
#' @name abs
#' @family normal_f
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18003
@felixcheung Thanks for your suggestion. I reverted back the change on
single line example.
I agree that the examples in sql are not informative. I will put in better
examples in another
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18003#discussion_r116886110
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2814,7 +2815,7 @@ setMethod("except",
#' path <- "path/to/file.json"
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18003
@bdwyer2 Thanks for the suggestion. I ran a check on all the R scripts and
fixed the style issues.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18003
@felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/18003
[SparkR] Fix bad examples in DataFrame methods
## What changes were proposed in this pull request?
Some examples in the DataFrame methods are syntactically wrong, even though
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@viirya @felixcheung Any additional changes needed for this one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@felixcheung Thanks so much for the review. I addressed most of the
comments except auto generating code for defining `stringOrderType`. This
parameter is not a shared trait on the Scala side
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17978#discussion_r116655031
--- Diff: python/pyspark/ml/feature.py ---
@@ -2111,26 +2112,45 @@ class StringIndexer(JavaEstimator, HasInputCol,
HasOutputCol, HasHandleInvalid
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17978#discussion_r116654554
--- Diff: python/pyspark/ml/feature.py ---
@@ -2082,8 +2082,9 @@ class StringIndexer(JavaEstimator, HasInputCol,
HasOutputCol, HasHandleInvalid
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17978#discussion_r116654542
--- Diff: python/pyspark/ml/feature.py ---
@@ -2139,6 +2159,20 @@ def setParams(self, inputCol=None, outputCol=None,
handleInvalid="
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@felixcheung Could you take a look? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17978#discussion_r116411579
--- Diff: python/pyspark/ml/feature.py ---
@@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol,
HasOutputCol, HasHandleInvalid
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@viirya Thanks much for your review. I corrected the typo and added some
tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@viirya @MLnick @BryanCutler @yinxusen @brkyvz @HyukjinKwon @srowen
Ping for reviews or comments. Thanks much.
---
If your project is set up for it, you can reply to this email and have
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@felixcheung Once this PR gets in, I'll update the SparkR side and include
some test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17978
@felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17978
[SPARK-20736] PySpark StringIndexer supports StringOrderType
## What changes were proposed in this pull request?
PySpark StringIndexer supports StringOrderType added in #17879.
You can
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17967#discussion_r116389971
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -37,6 +37,29 @@ import org.apache.spark.sql.types
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@felixcheung Thanks for the review. I fixed some typo.
Below is an example to show the difference in model estimates due to
different string ordering between R and RFormula
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17084
@imatiach-msft Thanks for the PR. Added a couple of comments. Sorry for the
delayed review.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r116364047
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
---
@@ -77,12 +87,16 @@ class
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r116364179
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryConfusionMatrix.scala
---
@@ -22,22 +22,22 @@ package
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r116364061
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
---
@@ -36,12 +36,18 @@ import
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r116364140
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -41,13 +41,27 @@ import
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r116364224
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -146,11 +160,13 @@ class
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@yanboliang @MLnick @HyukjinKwon @jkbradley @sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
Questions I have:
1. StringIndexer and RFormula use repetitive code for defining
`stringOrderType`. Should this be moved to a trait?
2. When setting `stringOrderType = "alphab
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17967
@felixcheung @viirya @ericl @yinxusen
```
val original = Seq((1, "foo", 4), (2, "bar", 4), (3, "bar", 5), (4, "baz",
5)
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17967
[SPARK-14659] RFormula allows to drop the same category as R when handling
strings
## What changes were proposed in this pull request?
When handling strings, the category dropped
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
Thanks much!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@felixcheung
It would be great if you could help merge this. I could address comments
(if any) in a future PR.
This seems a pretty straightforward change that removes a big blocker. I
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@felixcheung Thanks much for your review.
@yanboliang @jkbradley Since there are two approvals, could you guys take a
look and merge if it's good? We really need this for a couple
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17864
@hhbyyh @sethah @MLnick
Could you take a look at the new commit? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@felixcheung @gatorsmile @MLnick @jkbradley @holdenk @yanboliang @srowen
@sethah
Would you please take another look and let me know any additional
suggestions? Thanks much!
---
If your
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115414436
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -59,6 +59,29 @@ private[feature] trait StringIndexerBase extends
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115413770
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -131,6 +167,12 @@ object StringIndexer extends
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
Thanks much @felixcheung and @viirya. I have addressed your comments.
- update from 2.2 to 2.3
- change `freq_desc` to `frequency_desc`.
- move toLowerCase to the getter method
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@felixcheung Thanks. I will update the annotation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115402478
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -131,6 +163,12 @@ object StringIndexer extends
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
ping @yanboliang @felixcheung
This is needed for one-hot encoding to be consistent with R, therefore
enabling directly comparison of Spark results to R. Could you guys please take
a look
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17884#discussion_r115157036
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -405,7 +405,7 @@ result <- gapply(
head(arrange(result, "max_mpg", decr
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17884#discussion_r115157013
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -1079,19 +1079,19 @@ There are three main object classes in SparkR you
may be working
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17884
@felixcheung I ran a quick QA on the vignettes and fixed some additional
typos and styles.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17884
@HyukjinKwon Thanks for pointing this out. I will keep this in mind next
time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17884
@felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17884
[SparkR][Doc] fix typo in vignettes
## What changes were proposed in this pull request?
Fix typo in vignettes
You can merge this pull request into a Git repository by running:
$ git
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115124754
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -59,6 +59,28 @@ private[feature] trait StringIndexerBase extends
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115116222
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -107,11 +135,15 @@ class StringIndexer @Since("
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17879#discussion_r115116212
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -59,6 +59,28 @@ private[feature] trait StringIndexerBase extends
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@viirya Thanks much for your comments. Made a new commit to address them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@yanboliang Since you have reported a few issues due to different encoding
between Spark and R (e.g., #SPARK-14659 and #SPARK-14657), probably you could
add some comments?
---
If your
Github user actuaryzhang closed the pull request at:
https://github.com/apache/spark/pull/17005
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@holdenk The main motivation for this PR is that the behavior of
StringIndexer will affect OneHotEncoder, RFormula and models estimated based on
these transformers. There have been a few
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17879
@jkbradley @MLnick @holdenk @pnpritchard @yanboliang @sethah @imatiach-msft
@srowen
An example to illustrate the idea:
```
val data = Seq((0, "b"), (1, "b&
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17879
[SPARK-20619][ML]StringIndexer supports multiple ways of label ordering
## What changes were proposed in this pull request?
StringIndexer maps labels to numbers according
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17864
@hhbyyh Thanks for the suggestion. I have made a new commit that always
casts the input to double and outputs the imputed column as double.
---
If your project is set up for it, you can
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17864
@sethah Thanks for summarizing the previous discussions.
What are you suggesting for this PR? I think it makes sense to log a
warning when imputing integer types with mean. In addition
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17864
@yanboliang @srowen @MLnick @jkbradley
The example below shows failure of Imputer on integer data.
```
val df = spark.createDataFrame( Seq(
(0, 1.0, 1.0, 1.0
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17864
[SPARK-20604][ML] Allow imputer to handle numeric types
## What changes were proposed in this pull request?
Imputer currently requires input column to be Double or Float
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17840
Thanks @yanboliang and @MLnick.
I have included the additional numeric types and updated the error msg.
---
If your project is set up for it, you can reply to this email and have your
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17840
Weird failure message, the log shows all tests passed...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/17840
@yinxusen @srowen @mengxr @jkbradley @VinceShieh @yanboliang
The example below shows failure of Bucketizer on integer data.
```
val splits = Array(-3.0, 0.0, 3.0)
val
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17840
[SPARK-20574][ML] Allow Bucketizer to handle non-Double column
## What changes were proposed in this pull request?
Bucketizer currently requires input column to be Double, but the logic
201 - 300 of 513 matches
Mail list logo