[GitHub] [spark] beliefer commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-05 Thread GitBox
beliefer commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r889863328 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -121,4 +124,34 @@ private[sql] object H2Dialect extends JdbcDialect { } super.clas

[GitHub] [spark] beliefer opened a new pull request, #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

2022-06-05 Thread GitBox
beliefer opened a new pull request, #36773: URL: https://github.com/apache/spark/pull/36773 ### What changes were proposed in this pull request? Spark supports a lot of linear regression aggregate functions now. Because `REGR_AVGX`, `REGR_AVGY`, `REGR_COUNT`, `REGR_SXX` and `REGR_SXY`

[GitHub] [spark] AngersZhuuuu commented on pull request #36723: [SPARK-39337][SQL] Refactor DescribeTableExec to remove duplicate filters

2022-06-05 Thread GitBox
AngersZh commented on PR #36723: URL: https://github.com/apache/spark/pull/36723#issuecomment-1147096805 > Gentle ping, @AngersZh . Can't find a efficient way to keep the order, will close this. -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] AngersZhuuuu closed pull request #36723: [SPARK-39337][SQL] Refactor DescribeTableExec to remove duplicate filters

2022-06-05 Thread GitBox
AngersZh closed pull request #36723: [SPARK-39337][SQL] Refactor DescribeTableExec to remove duplicate filters URL: https://github.com/apache/spark/pull/36723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] cxzl25 opened a new pull request, #36772: [SPARK-39387][BUILD] Upgrade hive-storage-api to 2.7.3

2022-06-05 Thread GitBox
cxzl25 opened a new pull request, #36772: URL: https://github.com/apache/spark/pull/36772 ### What changes were proposed in this pull request? This PR aims to upgrade Apache Hive `hive-storage-api` library from 2.7.2 to 2.7.3. ### Why are the changes needed? [HIVE-25190](htt

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-05 Thread GitBox
Eugene-Mark commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1147065740 We have "maximum scale" [defined in Spark](https://github.com/apache/spark/blob/bab70b1ef24a2461395b32f609a9274269cb000e/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalT

[GitHub] [spark] LuciferYang commented on pull request #36616: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordR

2022-06-05 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1147061886 cc @dongjoon-hyun @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

2022-06-05 Thread GitBox
dongjoon-hyun commented on code in PR #35789: URL: https://github.com/apache/spark/pull/35789#discussion_r889832491 ## sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala: ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

2022-06-05 Thread GitBox
dongjoon-hyun commented on code in PR #35789: URL: https://github.com/apache/spark/pull/35789#discussion_r889832491 ## sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala: ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WangGuangxin commented on pull request #36626: [SPARK-39249][SQL] Improve subexpression elimination for conditional expressions

2022-06-05 Thread GitBox
WangGuangxin commented on PR #36626: URL: https://github.com/apache/spark/pull/36626#issuecomment-1147034877 @viirya @cloud-fan Could you please help review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyum commented on pull request #36766: [SPARK-32184][SQL] Remove inferred predicate if it has InOrCorrelatedExistsSubquery

2022-06-05 Thread GitBox
wangyum commented on PR #36766: URL: https://github.com/apache/spark/pull/36766#issuecomment-1147023754 @cloud-fan @sigmod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum commented on pull request #36764: [SPARK-39377][SQL][TESTS] Normalize expr ids in ListQuery and Exists expressions

2022-06-05 Thread GitBox
wangyum commented on PR #36764: URL: https://github.com/apache/spark/pull/36764#issuecomment-1147022373 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on pull request #36764: [SPARK-39377][SQL][TESTS] Normalize expr ids in ListQuery and Exists expressions

2022-06-05 Thread GitBox
wangyum commented on PR #36764: URL: https://github.com/apache/spark/pull/36764#issuecomment-1147022276 Thank you @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum closed pull request #36764: [SPARK-39377][SQL][TESTS] Normalize expr ids in ListQuery and Exists expressions

2022-06-05 Thread GitBox
wangyum closed pull request #36764: [SPARK-39377][SQL][TESTS] Normalize expr ids in ListQuery and Exists expressions URL: https://github.com/apache/spark/pull/36764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cxzl25 commented on pull request #36740: [SPARK-39355][SQL] Avoid UnresolvedAttribute.apply throwing ParseException

2022-06-05 Thread GitBox
cxzl25 commented on PR #36740: URL: https://github.com/apache/spark/pull/36740#issuecomment-1147014470 @sarutak @cloud-fan @dongjoon-hyun After the introduction of SPARK-34636, some SQL will fail to parse. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] beliefer commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-05 Thread GitBox
beliefer commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r889799255 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -259,6 +259,49 @@ class V2ExpressionBuilder( } else { Non

[GitHub] [spark] beliefer commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-05 Thread GitBox
beliefer commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r889799255 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -259,6 +259,49 @@ class V2ExpressionBuilder( } else { Non

[GitHub] [spark] LuciferYang commented on pull request #36732: [SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` with `filterNot(condition)`

2022-06-05 Thread GitBox
LuciferYang commented on PR #36732: URL: https://github.com/apache/spark/pull/36732#issuecomment-1146991307 > The change is OK; I don't think it adds any performance. The only hesitation here would be code churn and possible merge conflicts due to this. I'm kind of neutral on it, I think it

[GitHub] [spark] wangyum commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-05 Thread GitBox
wangyum commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1146989466 Please wait me several days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on pull request #36573: [SPARK-38829][SQL][FOLLOWUP] Add `PARQUET_TIMESTAMP_NTZ_ENABLED` configuration for `ParquetWrite.prepareWrite`

2022-06-05 Thread GitBox
LuciferYang commented on PR #36573: URL: https://github.com/apache/spark/pull/36573#issuecomment-1146988756 close this first and will re-open this when we can write parquet data through V2 API -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang closed pull request #36573: [SPARK-38829][SQL][FOLLOWUP] Add `PARQUET_TIMESTAMP_NTZ_ENABLED` configuration for `ParquetWrite.prepareWrite`

2022-06-05 Thread GitBox
LuciferYang closed pull request #36573: [SPARK-38829][SQL][FOLLOWUP] Add `PARQUET_TIMESTAMP_NTZ_ENABLED` configuration for `ParquetWrite.prepareWrite` URL: https://github.com/apache/spark/pull/36573 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] beliefer commented on a diff in pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

2022-06-05 Thread GitBox
beliefer commented on code in PR #36662: URL: https://github.com/apache/spark/pull/36662#discussion_r889782405 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2504,9 +2504,10 @@ object Decode { usage = """ _FUNC_(b

[GitHub] [spark] beliefer commented on pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-06-05 Thread GitBox
beliefer commented on PR #36295: URL: https://github.com/apache/spark/pull/36295#issuecomment-1146967033 > Can we have some kind of performance numbers for "push down OFFSET could improves the performance."? For most JDBC data source, push down OFFSET could improves the performance.

[GitHub] [spark] github-actions[bot] commented on pull request #32397: [WIP][SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-1146915375 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35250: [SPARK-37961][SQL] Override maxRows/maxRowsPerPartition for some logical operators

2022-06-05 Thread GitBox
github-actions[bot] closed pull request #35250: [SPARK-37961][SQL] Override maxRows/maxRowsPerPartition for some logical operators URL: https://github.com/apache/spark/pull/35250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] github-actions[bot] commented on pull request #35363: [SPARK-38066][SQL] evaluateEquality should ignore attribute without min/max ColumnStat

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #35363: URL: https://github.com/apache/spark/pull/35363#issuecomment-1146915363 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #35612: URL: https://github.com/apache/spark/pull/35612#issuecomment-1146915346 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35417: [SPARK-38102][CORE] Support custom commitProtocolClass in saveAsNewAPIHadoopDataset

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #35417: URL: https://github.com/apache/spark/pull/35417#issuecomment-1146915359 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35460: [SPARK-38160][SQL] Shuffle by rand could lead to incorrect answers when ShuffleFetchFailed happend

2022-06-05 Thread GitBox
github-actions[bot] closed pull request #35460: [SPARK-38160][SQL] Shuffle by rand could lead to incorrect answers when ShuffleFetchFailed happend URL: https://github.com/apache/spark/pull/35460 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] github-actions[bot] closed pull request #35620: [SPArK-38294][SQL] DDLUtils.verifyNotReadPath should check target is subDir

2022-06-05 Thread GitBox
github-actions[bot] closed pull request #35620: [SPArK-38294][SQL] DDLUtils.verifyNotReadPath should check target is subDir URL: https://github.com/apache/spark/pull/35620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] huaxingao commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-05 Thread GitBox
huaxingao commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1146911640 @wangyum is helping me testing this because he has lots of `in/notIn` test cases. I will change this PR to draft for now. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] dongjoon-hyun commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-05 Thread GitBox
dongjoon-hyun commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1146909697 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] dongjoon-hyun commented on pull request #36723: [SPARK-39337][SQL] Refactor DescribeTableExec to remove duplicate filters

2022-06-05 Thread GitBox
dongjoon-hyun commented on PR #36723: URL: https://github.com/apache/spark/pull/36723#issuecomment-1146909417 Gentle ping, @AngersZh . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] dongjoon-hyun commented on pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-05 Thread GitBox
dongjoon-hyun commented on PR #36701: URL: https://github.com/apache/spark/pull/36701#issuecomment-1146909153 Thank you, @pralabhkumar and @HyukjinKwon . Merged to master for Apache Spark 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] dongjoon-hyun closed pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-05 Thread GitBox
dongjoon-hyun closed pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py URL: https://github.com/apache/spark/pull/36701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] AmplabJenkins commented on pull request #36771: [WIP][SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ADD COLUMNS to V2 data sources

2022-06-05 Thread GitBox
AmplabJenkins commented on PR #36771: URL: https://github.com/apache/spark/pull/36771#issuecomment-1146896095 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

2022-06-05 Thread GitBox
sadikovi commented on code in PR #36726: URL: https://github.com/apache/spark/pull/36726#discussion_r889742470 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1879,5 +1880,53 @@ class JDBCSuite extends QueryTest val fields = schema.fields a

[GitHub] [spark] dtenedor opened a new pull request, #36771: [WIP][SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ADD COLUMNS to V2 data sources

2022-06-05 Thread GitBox
dtenedor opened a new pull request, #36771: URL: https://github.com/apache/spark/pull/36771 ### What changes were proposed in this pull request? Extend DEFAULT column support in ALTER TABLE ADD COLUMNS commands to include V2 data sources. Example: ``` > create or repl

[GitHub] [spark] sadikovi commented on pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-06-05 Thread GitBox
sadikovi commented on PR #36295: URL: https://github.com/apache/spark/pull/36295#issuecomment-1146883974 Can we have some kind of performance numbers for "push down OFFSET could improves the performance."? -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [spark] Kimahriman commented on pull request #36767: [SPARK-39363][K8S] Deprecate k8s memory overhead and make it optional

2022-06-05 Thread GitBox
Kimahriman commented on PR #36767: URL: https://github.com/apache/spark/pull/36767#issuecomment-1146853120 > Could you update the PR description with the actual logs (BEFORE and AFTER)? I'm not really sure how to trigger the old deprecation warning, don't actually use k8s. -- This

[GitHub] [spark] Kimahriman commented on a diff in pull request #36767: [SPARK-39363][K8S] Deprecate k8s memory overhead and make it optional

2022-06-05 Thread GitBox
Kimahriman commented on code in PR #36767: URL: https://github.com/apache/spark/pull/36767#discussion_r889715991 ## core/src/main/scala/org/apache/spark/SparkConf.scala: ## @@ -638,7 +638,9 @@ private[spark] object SparkConf extends Logging { DeprecatedConfig("spark.black

[GitHub] [spark] srowen commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-05 Thread GitBox
srowen commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1146838926 I see, so we should interpret this as "maximum scale" or something in Spark? that seems OK, and if we're only confident about Teradata, this seems OK. Let's add a note in the release notes

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-05 Thread GitBox
Eugene-Mark commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1146822523 For NUMBER(*) on Teradata, the scale is not fixed but can suit itself to different value, as they said, it's only constrained by `system limit`. So the issue for Teradata is about how

[GitHub] [spark] AmplabJenkins commented on pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

2022-06-05 Thread GitBox
AmplabJenkins commented on PR #36769: URL: https://github.com/apache/spark/pull/36769#issuecomment-1146784621 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36770: [SPARK-39382][WEBUI] UI show the duartion of the failed task when the executor lost

2022-06-05 Thread GitBox
AmplabJenkins commented on PR #36770: URL: https://github.com/apache/spark/pull/36770#issuecomment-1146784611 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cxzl25 commented on pull request #36770: [SPARK-39382][WEBUI] UI show the duartion of the failed task when the executor lost

2022-06-05 Thread GitBox
cxzl25 commented on PR #36770: URL: https://github.com/apache/spark/pull/36770#issuecomment-1146770734 ## Current ![fail_task_current](https://user-images.githubusercontent.com/3898450/172043757-c9b8bf54-1c80-4a0e-b67e-a2a4d79e93b0.png) ## Fix ![fail_task_duration](https://

[GitHub] [spark] cxzl25 opened a new pull request, #36770: [SPARK-39382][WEBUI] UI show the duartion of the failed task when the executor lost

2022-06-05 Thread GitBox
cxzl25 opened a new pull request, #36770: URL: https://github.com/apache/spark/pull/36770 ### What changes were proposed in this pull request? When task status is failed and `executorRunTime` has no value, try to use `duration`. ### Why are the changes needed? When the execu

[GitHub] [spark] MaxGekk commented on pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

2022-06-05 Thread GitBox
MaxGekk commented on PR #36662: URL: https://github.com/apache/spark/pull/36662#issuecomment-1146765074 @beliefer @cloud-fan Please, take a look at this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk closed pull request #36760: [SPARK-39374][SQL] Improve error message for user specified column list

2022-06-05 Thread GitBox
MaxGekk closed pull request #36760: [SPARK-39374][SQL] Improve error message for user specified column list URL: https://github.com/apache/spark/pull/36760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #36760: [SPARK-39374][SQL] Improve error message for user specified column list

2022-06-05 Thread GitBox
MaxGekk commented on PR #36760: URL: https://github.com/apache/spark/pull/36760#issuecomment-1146764006 +1, LGTM. Merging to master. Thank you, @wangyum and @singhpk234 for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] cxzl25 opened a new pull request, #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

2022-06-05 Thread GitBox
cxzl25 opened a new pull request, #36769: URL: https://github.com/apache/spark/pull/36769 ### What changes were proposed in this pull request? Introduce configuration items and set batch size when constructing orc writer. ### Why are the changes needed? Now vectorized columar or