[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-10-01 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-702516837 Thank you @maropu , @viirya and @cloud-fan for all discussion and help! This is an automated message from the Apache

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-30 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701825592 Addressed all comments and updated PR description to reflect latest status of thing. cc @maropu thanks. This is an

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-30 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701458412 @viirya - wondering any other comment? This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-30 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701209877 > Could you drop .internal accordingly if users need to care about the added option? @maropu - sure, removed `internal()`. --

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-29 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701158601 Disabled the config by default and addressed all new format comments. Let me know if there's anything needed to be addressed. Will look into how to workaround cached query problem i

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-29 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701151447 @viirya - thanks for pointing out. With query cache, e.g. dataframe user calls `persist()`, we will store the query data as logical operator `InMemoryRelation` and later on with phy

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-29 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701086045 @viirya - wondering could you give an example that this feature can cause regression? Note this feature is to disable bucketed scan but not enable more. If there's further improveme

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-29 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-700855626 > Can we apply this feature in AQE? seems we just need to add this rule to AdaptiveSparkPlanExec.queryStagePreparationRules. This can be done in followup. @cloud-fan - thanks,

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-28 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-700401382 Let me know if anything else needs to be addressed before merge, thanks. This is an automated message from the Apach

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-25 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-698266541 @cloud-fan and @maropu - the PR is ready for review again after adding more unit tests and addressed all comments. Thanks.

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-24 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-698266541 @cloud-fan and @maropu - the PR is ready for review again after adding more unit tests and addressed all comments. Thanks.

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-18 Thread GitBox
c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-694767363 cc @cloud-fan and @sameeragarwal if you guys have time to take a look, thanks. This is an automated message from th