HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-809823145
Thanks all for reviewing and merging!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-806410924
retest this, please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-806332643
I took the diff on printed plans (expected, actual) in stack trace, and they
seem to be identical.
```
TakeOrderedAndProject
[i_item_id,i_item_desc,s_state,store_s
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-806327246
> Test build #136482 has finished for PR 31355 at commit e8fb6f1.
>
> This patch fails Spark unit tests.
> This patch merges cleanly.
> This patch adds no public
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-805430634
Appreciate another round of reviewing, thanks in advance! @viirya @rdblue
@sunchao @cloud-fan @aokolnychyi @dongjoon-hyun
--
This is an automated message from the Apache G
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-804598821
I just removed the handling of non specific distribution case. Please take a
look again. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-800730995
I'll just remove the handling of non specific distribution case early next
week and ask for review again unless I hear some voice before.
--
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-799000237
Kindly reminder @rdblue - looks like the static number of partitions for non
specific distribution is the only thing we need to sort out, and it's not a one
of initial goals
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-794970669
I'm OK to exclude it here and let someone come up with actual use case. As
that was requested by @rdblue let's hear the voice before making change.
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-794441176
I don't have a real use case as I commented earlier on review comment, but
you could imagine the case where external storage has limited bandwidth on
concurrent writes. Quit
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-791791990
Kindly reminder @viirya @rdblue @sunchao @cloud-fan @aokolnychyi
@dongjoon-hyun
This is an automated messa
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-788617728
I think now this PR only has a concern on how to deal with static number of
partitions when there's no required distribution.
I picked repartition instead of coalesce
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-779552011
Addressed removing the requirement on any specific distribution to require
static number of partitions. Also renamed the method as it's no longer specific
to distribution.
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-779436038
As I just commented on comment thread I'm OK to apply the parallelism
regardless of requirement on distribution/sort. That sounds like a valid case,
like the case the writer
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-778848458
Kindly reminder: it would be appreciated if I could get another round of
review, and/or get some voices on the interface. ("keep it as it is" vs "have a
separate interface e
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-770332998
I agree it may not be a good idea to add the method directly in
RequiresDistributionAndOrdering, if we are sure to expand the possibility of
controlling partitions. I'm open
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768866804
NOTE to reviewers: I've changed the location of required number of
partitions from Distribution interfaces/implementations to
RequiresDistributionAndOrdering, so that Distri
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768802431
Yes thanks for confirming. I wanted to make sure my understanding about
possible behavioral change is correct. I agree that is not related to this PR.
-
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768801016
> AQE won't kick in if users specify num partitions, e.g. df.repartition(5),
I think the same applies here if the sink requires a certain num partitions.
The case I im
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768780252
Probably the discussion would be more constructive/productive if each idea
may bring the explanation with the actual case, which storage is expected to
get benefit on that,
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768580667
I'd like to see our preference on dealing with "possible (but just floated
ideas now) improvements" in public API.
I think this use case is clear, and the change fulfi
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768107101
Actually the proposal is more likely giving data source to force having
static number of partitions regardless of output data.
I see valid concerns about drawbacks wh
HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768096504
cc. @cloud-fan @aokolnychyi
Let's discuss in this PR, or please let me know if we prefer to have
discussion in dev@ mailing list.
--
23 matches
Mail list logo