[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-11-10 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-725187311 @felixcheung I think this PR was ready to merge, could you help to reopen? This is an automated message f

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727694593 @HyukjinKwon unfortunately I've since switched companies so I'm going from memory. But the basic idea AIRI is that `mutate` tries to auto-infer column names when not supp

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727768929 @HyukjinKwon test added, please have a look This is an automated message from the Apache Git Service. To

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-16 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-728044077 collapse is an aggregation method. input is length>1 & collapse guarantees length-1 output. what's happening is separate has variable return length -- usually it

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-16 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-728182679 then what happens when users send an expression with width>500? not likely but it's safer to just collapse the result On Mon, Nov 16, 2020, 11:40 AM S Daniel

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-16 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-728186184 e.g. in [`SPARK-31517`](https://issues.apache.org/jira/browse/SPARK-31517), the supplied width was 98: ``` over(rank(), orderBy(windowPartitionBy(column("cyl")),

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-16 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-728724666 > Ideally, we'd like both to behave the same way (possibly resolving the problem with mutate), that's however a breaking change. A reasonable request (though defini

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-16 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-728726963 > Adding magrittr as a dependency Best saved for a different thread, but TL;DR: yes, I think that's a good idea, `magrittr` will help writing `SparkR` code that's m

[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-05-03 Thread GitBox
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-623224648 Thanks @felixcheung. both backports are directly from `base` R: https://github.com/wch/r-source/blob/08ebf253e44e10bfb445f27b53b2a43bc7e6740d/src/library/base/R/uti