[jira] [Created] (SPARK-40994) Add code example for JDBC data source with partitionColumn

2022-11-02 Thread Cheng Su (Jira)
Cheng Su created SPARK-40994: Summary: Add code example for JDBC data source with partitionColumn Key: SPARK-40994 URL: https://issues.apache.org/jira/browse/SPARK-40994 Project: Spark Issue Type

[jira] [Created] (SPARK-39849) Dataset.as(StructType) fills missing new columns with null value

2022-07-24 Thread Cheng Su (Jira)
Cheng Su created SPARK-39849: Summary: Dataset.as(StructType) fills missing new columns with null value Key: SPARK-39849 URL: https://issues.apache.org/jira/browse/SPARK-39849 Project: Spark Iss

[jira] [Commented] (SPARK-37333) Specify the required distribution at V1Write

2022-07-19 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568643#comment-17568643 ] Cheng Su commented on SPARK-37333: -- Just FYI, I am working on this in this week. The mo

[jira] [Updated] (SPARK-37333) Specify the required distribution at V1Write

2022-07-19 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37333: - Affects Version/s: 3.4.0 (was: 3.3.0) > Specify the required distribution at

[jira] [Created] (SPARK-39777) Remove Hive bucketing incompatibility doc

2022-07-14 Thread Cheng Su (Jira)
Cheng Su created SPARK-39777: Summary: Remove Hive bucketing incompatibility doc Key: SPARK-39777 URL: https://issues.apache.org/jira/browse/SPARK-39777 Project: Spark Issue Type: Documentation

[jira] [Created] (SPARK-39751) Better naming for hash aggregate key probing metric

2022-07-12 Thread Cheng Su (Jira)
Cheng Su created SPARK-39751: Summary: Better naming for hash aggregate key probing metric Key: SPARK-39751 URL: https://issues.apache.org/jira/browse/SPARK-39751 Project: Spark Issue Type: Impro

[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-10 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504500#comment-17504500 ] Cheng Su commented on SPARK-34960: -- Thanks [~tgraves] and [~ahussein] for commenting, a

[jira] [Created] (SPARK-38354) Add hash probes metrics for shuffled hash join

2022-02-28 Thread Cheng Su (Jira)
Cheng Su created SPARK-38354: Summary: Add hash probes metrics for shuffled hash join Key: SPARK-38354 URL: https://issues.apache.org/jira/browse/SPARK-38354 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-38018) Fix ColumnVectorUtils.populate to handle CalendarIntervalType correctly

2022-01-24 Thread Cheng Su (Jira)
Cheng Su created SPARK-38018: Summary: Fix ColumnVectorUtils.populate to handle CalendarIntervalType correctly Key: SPARK-38018 URL: https://issues.apache.org/jira/browse/SPARK-38018 Project: Spark

[jira] [Created] (SPARK-38015) Mark legacy file naming functions as deprecated in FileCommitProtocol

2022-01-24 Thread Cheng Su (Jira)
Cheng Su created SPARK-38015: Summary: Mark legacy file naming functions as deprecated in FileCommitProtocol Key: SPARK-38015 URL: https://issues.apache.org/jira/browse/SPARK-38015 Project: Spark

[jira] [Created] (SPARK-38006) Clean up duplicated planner logic for window operator

2022-01-24 Thread Cheng Su (Jira)
Cheng Su created SPARK-38006: Summary: Clean up duplicated planner logic for window operator Key: SPARK-38006 URL: https://issues.apache.org/jira/browse/SPARK-38006 Project: Spark Issue Type: Imp

[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted

2022-01-21 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480300#comment-17480300 ] Cheng Su commented on SPARK-18591: -- Just FYI, the Jira should be fixed by https://issu

[jira] [Created] (SPARK-37983) Backout agg build time metrics from sort aggregate

2022-01-21 Thread Cheng Su (Jira)
Cheng Su created SPARK-37983: Summary: Backout agg build time metrics from sort aggregate Key: SPARK-37983 URL: https://issues.apache.org/jira/browse/SPARK-37983 Project: Spark Issue Type: Sub-ta

[jira] [Created] (SPARK-37813) ORC read benchmark should enable vectorization for nested column

2022-01-04 Thread Cheng Su (Jira)
Cheng Su created SPARK-37813: Summary: ORC read benchmark should enable vectorization for nested column Key: SPARK-37813 URL: https://issues.apache.org/jira/browse/SPARK-37813 Project: Spark Iss

[jira] [Created] (SPARK-37726) Add spill size metrics for sort merge join

2021-12-23 Thread Cheng Su (Jira)
Cheng Su created SPARK-37726: Summary: Add spill size metrics for sort merge join Key: SPARK-37726 URL: https://issues.apache.org/jira/browse/SPARK-37726 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-19256) Hive bucketing write support

2021-12-09 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-19256: - Affects Version/s: 3.3.0 > Hive bucketing write support > > >

[jira] [Created] (SPARK-37564) Support sort aggregate code-gen without grouping keys

2021-12-06 Thread Cheng Su (Jira)
Cheng Su created SPARK-37564: Summary: Support sort aggregate code-gen without grouping keys Key: SPARK-37564 URL: https://issues.apache.org/jira/browse/SPARK-37564 Project: Spark Issue Type: Sub

[jira] [Created] (SPARK-37557) Replace object hash with sort aggregate if child is already sorted

2021-12-05 Thread Cheng Su (Jira)
Cheng Su created SPARK-37557: Summary: Replace object hash with sort aggregate if child is already sorted Key: SPARK-37557 URL: https://issues.apache.org/jira/browse/SPARK-37557 Project: Spark I

[jira] [Updated] (SPARK-34287) Spark Aggregate improvement

2021-12-03 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34287: - Summary: Spark Aggregate improvement (was: Aggregation improvement) > Spark Aggregate improvement > ---

[jira] [Updated] (SPARK-34287) Spark aggregate improvement

2021-12-03 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34287: - Summary: Spark aggregate improvement (was: Spark Aggregate improvement) > Spark aggregate improvement >

[jira] [Updated] (SPARK-34287) Aggregation improvement

2021-12-03 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34287: - Description: Creating this umbrella Jira to track overall progress for Spark aggregate improvement. See

[jira] [Updated] (SPARK-34287) Aggregation improvement

2021-12-03 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34287: - Summary: Aggregation improvement (was: Object hash and sort aggregation improvement) > Aggregation imp

[jira] [Created] (SPARK-37455) Replace hash with sort aggregate if child is already sorted

2021-11-24 Thread Cheng Su (Jira)
Cheng Su created SPARK-37455: Summary: Replace hash with sort aggregate if child is already sorted Key: SPARK-37455 URL: https://issues.apache.org/jira/browse/SPARK-37455 Project: Spark Issue Ty

[jira] [Updated] (SPARK-34287) Object hash and sort aggregation improvement

2021-11-24 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34287: - Summary: Object hash and sort aggregation improvement (was: Object hash and sort aggregation execution

[jira] [Created] (SPARK-37370) Add SQL configs to control newly added join code-gen in 3.3

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37370: Summary: Add SQL configs to control newly added join code-gen in 3.3 Key: SPARK-37370 URL: https://issues.apache.org/jira/browse/SPARK-37370 Project: Spark Issue Ty

[jira] [Created] (SPARK-37366) Add benchmark for Z-order

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37366: Summary: Add benchmark for Z-order Key: SPARK-37366 URL: https://issues.apache.org/jira/browse/SPARK-37366 Project: Spark Issue Type: Sub-task Components:

[jira] [Updated] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37361: - Description: This is the umbrella Jira to track the progress of introducing Z-order in Spark. Z-order e

[jira] [Updated] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37361: - Description: This is the umbrella Jira to track the progress of introducing Z-order in Spark. Z-order e

[jira] [Commented] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445584#comment-17445584 ] Cheng Su commented on SPARK-37361: -- Just FYI, I am working on each sub-task now. Thanks

[jira] [Created] (SPARK-37365) Add ZORDER BY syntax and plan rule

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37365: Summary: Add ZORDER BY syntax and plan rule Key: SPARK-37365 URL: https://issues.apache.org/jira/browse/SPARK-37365 Project: Spark Issue Type: Sub-task Com

[jira] [Created] (SPARK-37364) Add code-gen evaluation for Z-order expression

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37364: Summary: Add code-gen evaluation for Z-order expression Key: SPARK-37364 URL: https://issues.apache.org/jira/browse/SPARK-37364 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37361: - Description: This is the umbrella Jira to track the progress of introducing Z-order in Spark. Z-order e

[jira] [Created] (SPARK-37363) Support string type in Z-order expression

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37363: Summary: Support string type in Z-order expression Key: SPARK-37363 URL: https://issues.apache.org/jira/browse/SPARK-37363 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-37362) Support float type in Z-order expression

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37362: Summary: Support float type in Z-order expression Key: SPARK-37362 URL: https://issues.apache.org/jira/browse/SPARK-37362 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-31585) Support Z-order curve expression

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-31585: - Summary: Support Z-order curve expression (was: Support Z-order curve) > Support Z-order curve expressi

[jira] [Updated] (SPARK-31585) Support Z-order curve expression

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-31585: - Parent: SPARK-37361 Issue Type: Sub-task (was: New Feature) > Support Z-order curve expression

[jira] [Updated] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37361: - Description: This is the umbrella Jira to track the progress of introducing Z-order in Spark. Z-order e

[jira] [Created] (SPARK-37361) Introduce Z-order for efficient data skipping

2021-11-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-37361: Summary: Introduce Z-order for efficient data skipping Key: SPARK-37361 URL: https://issues.apache.org/jira/browse/SPARK-37361 Project: Spark Issue Type: Umbrella

[jira] [Created] (SPARK-37341) Avoid unnecessary buffer and copy in full outer sort merge join

2021-11-15 Thread Cheng Su (Jira)
Cheng Su created SPARK-37341: Summary: Avoid unnecessary buffer and copy in full outer sort merge join Key: SPARK-37341 URL: https://issues.apache.org/jira/browse/SPARK-37341 Project: Spark Issu

[jira] [Commented] (SPARK-37316) Add code-gen for existence sort merge join

2021-11-13 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443227#comment-17443227 ] Cheng Su commented on SPARK-37316: -- Will raise a PR soon after https://issues.apache.or

[jira] [Created] (SPARK-37316) Add code-gen for existence sort merge join

2021-11-13 Thread Cheng Su (Jira)
Cheng Su created SPARK-37316: Summary: Add code-gen for existence sort merge join Key: SPARK-37316 URL: https://issues.apache.org/jira/browse/SPARK-37316 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-35352) Add code-gen for full outer sort merge join

2021-11-13 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-35352: - Affects Version/s: 3.3.0 > Add code-gen for full outer sort merge join > ---

[jira] [Updated] (SPARK-37223) Fix unit test check in JoinHintSuite

2021-11-05 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37223: - Issue Type: Improvement (was: Bug) > Fix unit test check in JoinHintSuite > ---

[jira] [Updated] (SPARK-37223) Fix unit test check in JoinHintSuite

2021-11-05 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-37223: - Issue Type: Test (was: Improvement) > Fix unit test check in JoinHintSuite > --

[jira] [Created] (SPARK-37223) Fix unit test check in JoinHintSuite

2021-11-05 Thread Cheng Su (Jira)
Cheng Su created SPARK-37223: Summary: Fix unit test check in JoinHintSuite Key: SPARK-37223 URL: https://issues.apache.org/jira/browse/SPARK-37223 Project: Spark Issue Type: Bug Compon

[jira] [Created] (SPARK-37220) Do not split input file for Parquet reader with aggregate push down

2021-11-05 Thread Cheng Su (Jira)
Cheng Su created SPARK-37220: Summary: Do not split input file for Parquet reader with aggregate push down Key: SPARK-37220 URL: https://issues.apache.org/jira/browse/SPARK-37220 Project: Spark

[jira] [Commented] (SPARK-37167) Add benchmark for aggregate push down

2021-11-05 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439569#comment-17439569 ] Cheng Su commented on SPARK-37167: -- Just FYI, I am working on it. > Add benchmark for

[jira] [Created] (SPARK-37167) Add benchmark for aggregate push down

2021-10-29 Thread Cheng Su (Jira)
Cheng Su created SPARK-37167: Summary: Add benchmark for aggregate push down Key: SPARK-37167 URL: https://issues.apache.org/jira/browse/SPARK-37167 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-13 Thread Cheng Su (Jira)
Cheng Su created SPARK-37001: Summary: Disable two level of map for final hash aggregation by default Key: SPARK-37001 URL: https://issues.apache.org/jira/browse/SPARK-37001 Project: Spark Issue

[jira] [Updated] (SPARK-36794) Ignore duplicated join keys when building relation for SEMI/ANTI hash join

2021-09-17 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-36794: - Summary: Ignore duplicated join keys when building relation for SEMI/ANTI hash join (was: Ignore duplic

[jira] [Created] (SPARK-36794) Ignore duplicated join keys when building relation for LEFT/ANTI hash join

2021-09-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-36794: Summary: Ignore duplicated join keys when building relation for LEFT/ANTI hash join Key: SPARK-36794 URL: https://issues.apache.org/jira/browse/SPARK-36794 Project: Spark

[jira] [Updated] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-36652: - Affects Version/s: (was: 3.2.0) > AQE dynamic join selection should not apply to non-equi join > ---

[jira] [Created] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Cheng Su (Jira)
Cheng Su created SPARK-36652: Summary: AQE dynamic join selection should not apply to non-equi join Key: SPARK-36652 URL: https://issues.apache.org/jira/browse/SPARK-36652 Project: Spark Issue T

[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join

2021-08-31 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407823#comment-17407823 ] Cheng Su commented on SPARK-36612: -- I agree some queries do fit in this scenario. We ca

[jira] [Created] (SPARK-36594) ORC vectorized reader should properly check maximal number of fields

2021-08-25 Thread Cheng Su (Jira)
Cheng Su created SPARK-36594: Summary: ORC vectorized reader should properly check maximal number of fields Key: SPARK-36594 URL: https://issues.apache.org/jira/browse/SPARK-36594 Project: Spark

[jira] [Created] (SPARK-36404) Support nested columns in ORC vectorized reader for data source v2

2021-08-03 Thread Cheng Su (Jira)
Cheng Su created SPARK-36404: Summary: Support nested columns in ORC vectorized reader for data source v2 Key: SPARK-36404 URL: https://issues.apache.org/jira/browse/SPARK-36404 Project: Spark I

[jira] [Created] (SPARK-36269) Fix only set data columns to Hive column names config

2021-07-22 Thread Cheng Su (Jira)
Cheng Su created SPARK-36269: Summary: Fix only set data columns to Hive column names config Key: SPARK-36269 URL: https://issues.apache.org/jira/browse/SPARK-36269 Project: Spark Issue Type: Imp

[jira] [Commented] (SPARK-24528) Add support to read multiple sorted bucket files for data source v1

2021-07-05 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375215#comment-17375215 ] Cheng Su commented on SPARK-24528: -- Hi [~rahij] - glad to hear that the PR is working f

[jira] [Created] (SPARK-35965) Add documentation for ORC nested column vectorized reader

2021-07-01 Thread Cheng Su (Jira)
Cheng Su created SPARK-35965: Summary: Add documentation for ORC nested column vectorized reader Key: SPARK-35965 URL: https://issues.apache.org/jira/browse/SPARK-35965 Project: Spark Issue Type:

[jira] [Updated] (SPARK-19256) Hive bucketing write support

2021-06-23 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-19256: - Affects Version/s: 3.2.0 > Hive bucketing write support > > >

[jira] [Commented] (SPARK-19256) Hive bucketing write support

2021-06-23 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368522#comment-17368522 ] Cheng Su commented on SPARK-19256: -- [~pushkarcse] - we are currently working on https:

[jira] [Created] (SPARK-35794) Allow custom plugin for AQE cost evaluator

2021-06-17 Thread Cheng Su (Jira)
Cheng Su created SPARK-35794: Summary: Allow custom plugin for AQE cost evaluator Key: SPARK-35794 URL: https://issues.apache.org/jira/browse/SPARK-35794 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-35791) Release on-going map properly for NULL-aware ANTI join

2021-06-16 Thread Cheng Su (Jira)
Cheng Su created SPARK-35791: Summary: Release on-going map properly for NULL-aware ANTI join Key: SPARK-35791 URL: https://issues.apache.org/jira/browse/SPARK-35791 Project: Spark Issue Type: Im

[jira] [Created] (SPARK-35760) Fix the max rows check for broadcast exchange

2021-06-14 Thread Cheng Su (Jira)
Cheng Su created SPARK-35760: Summary: Fix the max rows check for broadcast exchange Key: SPARK-35760 URL: https://issues.apache.org/jira/browse/SPARK-35760 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-32709) Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)

2021-06-09 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360287#comment-17360287 ] Cheng Su commented on SPARK-32709: -- [~spedamallu] - yes I am still working on it. It's

[jira] [Created] (SPARK-35693) Add plan check for stream-stream join unit test

2021-06-09 Thread Cheng Su (Jira)
Cheng Su created SPARK-35693: Summary: Add plan check for stream-stream join unit test Key: SPARK-35693 URL: https://issues.apache.org/jira/browse/SPARK-35693 Project: Spark Issue Type: Test

[jira] [Updated] (SPARK-35690) Stream-stream join keys should be reordered properly

2021-06-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-35690: - Issue Type: Improvement (was: Documentation) > Stream-stream join keys should be reordered properly > -

[jira] [Created] (SPARK-35690) Stream-stream join keys should be reordered properly

2021-06-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35690: Summary: Stream-stream join keys should be reordered properly Key: SPARK-35690 URL: https://issues.apache.org/jira/browse/SPARK-35690 Project: Spark Issue Type: Docu

[jira] [Updated] (SPARK-35604) Fix condition check for FULL OUTER sort merge join

2021-06-02 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-35604: - Issue Type: Improvement (was: Documentation) > Fix condition check for FULL OUTER sort merge join > ---

[jira] [Created] (SPARK-35604) Fix condition check for FULL OUTER sort merge join

2021-06-01 Thread Cheng Su (Jira)
Cheng Su created SPARK-35604: Summary: Fix condition check for FULL OUTER sort merge join Key: SPARK-35604 URL: https://issues.apache.org/jira/browse/SPARK-35604 Project: Spark Issue Type: Docume

[jira] [Created] (SPARK-35529) Add fallback metrics for hash aggregate

2021-05-25 Thread Cheng Su (Jira)
Cheng Su created SPARK-35529: Summary: Add fallback metrics for hash aggregate Key: SPARK-35529 URL: https://issues.apache.org/jira/browse/SPARK-35529 Project: Spark Issue Type: Documentation

[jira] [Created] (SPARK-35438) Minor documentation fix for window physical operator

2021-05-18 Thread Cheng Su (Jira)
Cheng Su created SPARK-35438: Summary: Minor documentation fix for window physical operator Key: SPARK-35438 URL: https://issues.apache.org/jira/browse/SPARK-35438 Project: Spark Issue Type: Docu

[jira] [Created] (SPARK-35363) Refactor sort merge join code-gen be agnostic to join type

2021-05-10 Thread Cheng Su (Jira)
Cheng Su created SPARK-35363: Summary: Refactor sort merge join code-gen be agnostic to join type Key: SPARK-35363 URL: https://issues.apache.org/jira/browse/SPARK-35363 Project: Spark Issue Type

[jira] [Created] (SPARK-35354) Minor cleanup to replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin

2021-05-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35354: Summary: Minor cleanup to replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin Key: SPARK-35354 URL: https://issues.apache.org/jira/browse/SPARK-35354 Project: Spa

[jira] [Commented] (SPARK-35352) Add code-gen for full outer sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341226#comment-17341226 ] Cheng Su commented on SPARK-35352: -- Will raise a PR soon. > Add code-gen for full oute

[jira] [Comment Edited] (SPARK-35351) Add code-gen for left anti sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341225#comment-17341225 ] Cheng Su edited comment on SPARK-35351 at 5/8/21, 7:28 AM: --- Wi

[jira] [Commented] (SPARK-35350) Add code-gen for left semi sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341224#comment-17341224 ] Cheng Su commented on SPARK-35350: -- Will raise a PR soon. > Add code-gen for left semi

[jira] [Created] (SPARK-35352) Add code-gen for full outer sort merge join

2021-05-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35352: Summary: Add code-gen for full outer sort merge join Key: SPARK-35352 URL: https://issues.apache.org/jira/browse/SPARK-35352 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-35351) Add code-gen for left anti sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341225#comment-17341225 ] Cheng Su commented on SPARK-35351: -- Will raise a PR soon. [|https://issues.apache.org/j

[jira] [Created] (SPARK-35351) Add code-gen for left anti sort merge join

2021-05-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35351: Summary: Add code-gen for left anti sort merge join Key: SPARK-35351 URL: https://issues.apache.org/jira/browse/SPARK-35351 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-35350) Add code-gen for left semi sort merge join

2021-05-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35350: Summary: Add code-gen for left semi sort merge join Key: SPARK-35350 URL: https://issues.apache.org/jira/browse/SPARK-35350 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-34705) Add code-gen for all join types of sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341219#comment-17341219 ] Cheng Su commented on SPARK-34705: -- Just FYI, we are working on each sub-tasks now. Tar

[jira] [Created] (SPARK-35349) Add code-gen for left/right outer sort merge join

2021-05-08 Thread Cheng Su (Jira)
Cheng Su created SPARK-35349: Summary: Add code-gen for left/right outer sort merge join Key: SPARK-35349 URL: https://issues.apache.org/jira/browse/SPARK-35349 Project: Spark Issue Type: Sub-tas

[jira] [Updated] (SPARK-34705) Add code-gen for all join types of sort merge join

2021-05-08 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-34705: - Issue Type: Umbrella (was: Improvement) > Add code-gen for all join types of sort merge join >

[jira] [Commented] (SPARK-34705) Add code-gen for all join types of sort merge join

2021-04-28 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335022#comment-17335022 ] Cheng Su commented on SPARK-34705: -- [~advancedxy] - We saw ~10% CPU performance improve

[jira] [Created] (SPARK-35241) Investigate to prefer vectorized hash map in hash aggregate selectively

2021-04-27 Thread Cheng Su (Jira)
Cheng Su created SPARK-35241: Summary: Investigate to prefer vectorized hash map in hash aggregate selectively Key: SPARK-35241 URL: https://issues.apache.org/jira/browse/SPARK-35241 Project: Spark

[jira] [Created] (SPARK-35235) Add row-based fast hash map into aggregate benchmark

2021-04-26 Thread Cheng Su (Jira)
Cheng Su created SPARK-35235: Summary: Add row-based fast hash map into aggregate benchmark Key: SPARK-35235 URL: https://issues.apache.org/jira/browse/SPARK-35235 Project: Spark Issue Type: Test

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331011#comment-17331011 ] Cheng Su commented on SPARK-35133: -- btw just to provide more context, I am running into

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330982#comment-17330982 ] Cheng Su commented on SPARK-35133: -- When ever developers/users want to debug generated

[jira] [Commented] (SPARK-35179) Introduce hybrid join for sort merge join and shuffled hash join in AQE

2021-04-21 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326991#comment-17326991 ] Cheng Su commented on SPARK-35179: -- Thanks for [~cloud_fan] for the idea. Please commen

[jira] [Updated] (SPARK-32461) Shuffled hash join improvement

2021-04-21 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-32461: - Affects Version/s: 3.2.0 > Shuffled hash join improvement > -- > >

[jira] [Created] (SPARK-35179) Introduce hybrid join for sort merge join and shuffled hash join in AQE

2021-04-21 Thread Cheng Su (Jira)
Cheng Su created SPARK-35179: Summary: Introduce hybrid join for sort merge join and shuffled hash join in AQE Key: SPARK-35179 URL: https://issues.apache.org/jira/browse/SPARK-35179 Project: Spark

[jira] [Created] (SPARK-35141) Support two level map for final hash aggregation

2021-04-19 Thread Cheng Su (Jira)
Cheng Su created SPARK-35141: Summary: Support two level map for final hash aggregation Key: SPARK-35141 URL: https://issues.apache.org/jira/browse/SPARK-35141 Project: Spark Issue Type: Improvem

[jira] [Updated] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-18 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-35133: - Description: `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the generated code for each

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-18 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324727#comment-17324727 ] Cheng Su commented on SPARK-35133: -- I am trying to come up with a clean solution to fix

[jira] [Created] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-18 Thread Cheng Su (Jira)
Cheng Su created SPARK-35133: Summary: EXPLAIN CODEGEN does not work with AQE Key: SPARK-35133 URL: https://issues.apache.org/jira/browse/SPARK-35133 Project: Spark Issue Type: Bug Comp

[jira] [Updated] (SPARK-35109) Fix minor exception messages of HashedRelation and HashJoin

2021-04-16 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-35109: - Summary: Fix minor exception messages of HashedRelation and HashJoin (was: Fix minor exception messages

[jira] [Created] (SPARK-35109) Fix minor exception messages of HashedRelation and HashedJoin

2021-04-16 Thread Cheng Su (Jira)
Cheng Su created SPARK-35109: Summary: Fix minor exception messages of HashedRelation and HashedJoin Key: SPARK-35109 URL: https://issues.apache.org/jira/browse/SPARK-35109 Project: Spark Issue

[jira] [Commented] (SPARK-32634) Introduce sort-based fallback mechanism for shuffled hash join

2021-04-07 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316683#comment-17316683 ] Cheng Su commented on SPARK-32634: -- [~Thomas Liu] - Implement fallback mechanism for wh

[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2021-04-05 Thread Cheng Su (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315070#comment-17315070 ] Cheng Su commented on SPARK-34960: -- Just FYI we will start sending out code after [htt

  1   2   3   >