[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-29 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-809823145 Thanks all for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-25 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-806410924 retest this, please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-24 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-806332643 I took the diff on printed plans (expected, actual) in stack trace, and they seem to be identical. ``` TakeOrderedAndProject

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-24 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-806327246 > Test build #136482 has finished for PR 31355 at commit e8fb6f1. > > This patch fails Spark unit tests. > This patch merges cleanly. > This patch adds no public

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-23 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-805430634 Appreciate another round of reviewing, thanks in advance! @viirya @rdblue @sunchao @cloud-fan @aokolnychyi @dongjoon-hyun -- This is an automated message from the Apache

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-22 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-804598821 I just removed the handling of non specific distribution case. Please take a look again. Thanks! -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-16 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-800730995 I'll just remove the handling of non specific distribution case early next week and ask for review again unless I hear some voice before.

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-14 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-799000237 Kindly reminder @rdblue - looks like the static number of partitions for non specific distribution is the only thing we need to sort out, and it's not a one of initial

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-09 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-794970669 I'm OK to exclude it here and let someone come up with actual use case. As that was requested by @rdblue let's hear the voice before making change.

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-09 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-794441176 I don't have a real use case as I commented earlier on review comment, but you could imagine the case where external storage has limited bandwidth on concurrent writes.

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-05 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-791791990 Kindly reminder @viirya @rdblue @sunchao @cloud-fan @aokolnychyi @dongjoon-hyun This is an automated

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-01 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-788617728 I think now this PR only has a concern on how to deal with static number of partitions when there's no required distribution. I picked repartition instead of

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-02-15 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-779552011 Addressed removing the requirement on any specific distribution to require static number of partitions. Also renamed the method as it's no longer specific to distribution.

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-02-15 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-779436038 As I just commented on comment thread I'm OK to apply the parallelism regardless of requirement on distribution/sort. That sounds like a valid case, like the case the

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-02-14 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-778848458 Kindly reminder: it would be appreciated if I could get another round of review, and/or get some voices on the interface. ("keep it as it is" vs "have a separate interface

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-30 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-770332998 I agree it may not be a good idea to add the method directly in RequiresDistributionAndOrdering, if we are sure to expand the possibility of controlling partitions. I'm

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-27 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768866804 NOTE to reviewers: I've changed the location of required number of partitions from Distribution interfaces/implementations to RequiresDistributionAndOrdering, so that

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-27 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768802431 Yes thanks for confirming. I wanted to make sure my understanding about possible behavioral change is correct. I agree that is not related to this PR.

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-27 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768801016 > AQE won't kick in if users specify num partitions, e.g. df.repartition(5), I think the same applies here if the sink requires a certain num partitions. The case I

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-27 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768780252 Probably the discussion would be more constructive/productive if each idea may bring the explanation with the actual case, which storage is expected to get benefit on that,

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-27 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768580667 I'd like to see our preference on dealing with "possible (but just floated ideas now) improvements" in public API. I think this use case is clear, and the change

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-26 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768107101 Actually the proposal is more likely giving data source to force having static number of partitions regardless of output data. I see valid concerns about drawbacks

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-01-26 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768096504 cc. @cloud-fan @aokolnychyi Let's discuss in this PR, or please let me know if we prefer to have discussion in dev@ mailing list.