[GitHub] [spark] manuzhang commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-12 Thread GitBox
manuzhang commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1153543507 @cloud-fan do we plan to back-port it to branch-3.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] dongjoon-hyun commented on pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
dongjoon-hyun commented on PR #36852: URL: https://github.com/apache/spark/pull/36852#issuecomment-1153540713 Thank you for updates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] MaxGekk commented on a diff in pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
MaxGekk commented on code in PR #36811: URL: https://github.com/apache/spark/pull/36811#discussion_r895359514 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -108,8 +108,9 @@ object Cast { case (TimestampType, TimestampNTZType) =>

[GitHub] [spark] MaxGekk commented on a diff in pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
MaxGekk commented on code in PR #36811: URL: https://github.com/apache/spark/pull/36811#discussion_r895359204 ## sql/core/src/test/resources/sql-tests/inputs/cast.sql: ## @@ -104,3 +104,15 @@ select cast('a' as timestamp_ntz); select cast(cast('inf' as double) as timestamp);

[GitHub] [spark] cloud-fan closed pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-12 Thread GitBox
cloud-fan closed pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic URL: https://github.com/apache/spark/pull/36698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-12 Thread GitBox
cloud-fan commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1153504963 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-12 Thread GitBox
mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r895335273 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1218,6 +1249,71 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36683: URL: https://github.com/apache/spark/pull/36683#discussion_r894062038 ## python/pyspark/sql/pandas/conversion.py: ## @@ -596,7 +596,7 @@ def _create_from_pandas_with_arrow( ] # Slice the DataFrame to be batched

[GitHub] [spark] HyukjinKwon commented on pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36683: URL: https://github.com/apache/spark/pull/36683#issuecomment-1153477826 Rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] cxzl25 commented on pull request #36740: [SPARK-39355][SQL] Avoid UnresolvedAttribute.apply throwing ParseException

2022-06-12 Thread GitBox
cxzl25 commented on PR #36740: URL: https://github.com/apache/spark/pull/36740#issuecomment-1153476515 Gentle ping @sarutak @cloud-fan @dongjoon-hyun This should be a bug, hope to help review. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] pan3793 commented on pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-12 Thread GitBox
pan3793 commented on PR #36789: URL: https://github.com/apache/spark/pull/36789#issuecomment-1153466782 @HyukjinKwon would you please take another look if you have time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] mridulm commented on pull request #35683: [SPARK-30835][SPARK-39018][CORE][YARN] Add support for YARN decommissioning when ESS is disabled

2022-06-12 Thread GitBox
mridulm commented on PR #35683: URL: https://github.com/apache/spark/pull/35683#issuecomment-1153436041 Can you update the description to reflect the changes made to the PR @abhishekd0907 ? Specifically - this is only related to decomissioning and we do not handle shuffle ? The c

[GitHub] [spark] mcdull-zhang commented on pull request #36831: [SPARK-39126][SQL] After eliminating join to one side, that side should take advantage of LocalShuffleRead optimization

2022-06-12 Thread GitBox
mcdull-zhang commented on PR #36831: URL: https://github.com/apache/spark/pull/36831#issuecomment-1153431584 @cloud-fan @ulysses-you friendly ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] pan3793 commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

2022-06-12 Thread GitBox
pan3793 commented on code in PR #36832: URL: https://github.com/apache/spark/pull/36832#discussion_r895303204 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock:

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-12 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r895292260 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -317,22 +353,24 @@ public void applicationRemoved(String app

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-12 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r895282920 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -350,15 +415,27 @@ void closeAndDeletePartitionFilesIfNeeded

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-12 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r895282269 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -342,6 +380,33 @@ void closeAndDeletePartitionFilesIfNeeded(

[GitHub] [spark] weixiuli commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-12 Thread GitBox
weixiuli commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r895280301 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -863,6 +872,29 @@ private[spark] class TaskSchedulerImpl( executorUpdates) }

[GitHub] [spark] weixiuli commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-12 Thread GitBox
weixiuli commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r895280301 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -863,6 +872,29 @@ private[spark] class TaskSchedulerImpl( executorUpdates) }

[GitHub] [spark] beliefer commented on a diff in pull request #36830: [SPARK-38761][SQL][FOLLOWUP] DS V2 supports push down misc non-aggregate functions

2022-06-12 Thread GitBox
beliefer commented on code in PR #36830: URL: https://github.com/apache/spark/pull/36830#discussion_r895270044 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -97,6 +97,10 @@ public String build(Expression expr) { r

[GitHub] [spark] beliefer commented on a diff in pull request #36830: [SPARK-38761][SQL][FOLLOWUP] DS V2 supports push down misc non-aggregate functions

2022-06-12 Thread GitBox
beliefer commented on code in PR #36830: URL: https://github.com/apache/spark/pull/36830#discussion_r895270044 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -97,6 +97,10 @@ public String build(Expression expr) { r

[GitHub] [spark] dongjoon-hyun commented on pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

2022-06-12 Thread GitBox
dongjoon-hyun commented on PR #36847: URL: https://github.com/apache/spark/pull/36847#issuecomment-1153369445 I checked Apache Spark 3.3.0 RC6 and added `3.3.0` to the Affected Version of the JIRA, @wangyum . ``` scala> spark.version val res0: String = 3.3.0 scala> sql("set

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36847: URL: https://github.com/apache/spark/pull/36847#discussion_r895266088 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala: ## @@ -87,7 +87,8 @@ class SparkOptimizer( GroupBasedRowLevelOperationScanPlan

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36847: URL: https://github.com/apache/spark/pull/36847#discussion_r895266012 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36847: URL: https://github.com/apache/spark/pull/36847#discussion_r895265570 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36830: [SPARK-38761][SQL][FOLLOWUP] DS V2 supports push down misc non-aggregate functions

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36830: URL: https://github.com/apache/spark/pull/36830#discussion_r895264380 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -97,6 +97,10 @@ public String build(Expression expr) {

[GitHub] [spark] weixiuli commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-12 Thread GitBox
weixiuli commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1153357410 ping @Ngone51 @mridulm Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] dongjoon-hyun commented on pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

2022-06-12 Thread GitBox
dongjoon-hyun commented on PR #36832: URL: https://github.com/apache/spark/pull/36832#issuecomment-1153354775 Thank you for your update, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36832: URL: https://github.com/apache/spark/pull/36832#discussion_r895262820 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36832: URL: https://github.com/apache/spark/pull/36832#discussion_r895262820 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf,

[GitHub] [spark] gengliangwang commented on a diff in pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
gengliangwang commented on code in PR #36811: URL: https://github.com/apache/spark/pull/36811#discussion_r895262282 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -108,8 +108,9 @@ object Cast { case (TimestampType, TimestampNTZTyp

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36832: [SPARK-39439][SHS] Check final file if in-progress event log file does not exist

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36832: URL: https://github.com/apache/spark/pull/36832#discussion_r895261688 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -747,6 +747,15 @@ private[history] class FsHistoryProvider(conf: SparkConf,

[GitHub] [spark] gengliangwang commented on a diff in pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
gengliangwang commented on code in PR #36811: URL: https://github.com/apache/spark/pull/36811#discussion_r895261353 ## sql/core/src/test/resources/sql-tests/inputs/cast.sql: ## @@ -104,3 +104,15 @@ select cast('a' as timestamp_ntz); select cast(cast('inf' as double) as timest

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36852: URL: https://github.com/apache/spark/pull/36852#discussion_r895261028 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -28,14 +28,30 @@ import com.fasterxml.jackson.module.scala.DefaultScalaModule import org.apache.s

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36852: URL: https://github.com/apache/spark/pull/36852#discussion_r895260896 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -28,14 +28,30 @@ import com.fasterxml.jackson.module.scala.DefaultScalaModule import org.apache.s

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36852: URL: https://github.com/apache/spark/pull/36852#discussion_r895260558 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -28,14 +28,30 @@ import com.fasterxml.jackson.module.scala.DefaultScalaModule import org.apache.s

[GitHub] [spark] HyukjinKwon commented on pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36829: URL: https://github.com/apache/spark/pull/36829#issuecomment-1153343979 cc @peter-toth @allisonwang-db @maryannxue FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon commented on pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36811: URL: https://github.com/apache/spark/pull/36811#issuecomment-1153343457 cc @gengliangwang FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] HyukjinKwon commented on pull request #36841: [SPARK-39444][SQL] Add OptimizeSubqueries into nonExcludableRules list

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36841: URL: https://github.com/apache/spark/pull/36841#issuecomment-1153342860 cc @maryannxue and @allisonwang-db FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #36842: [SPARK-39445][SQL] Remove the window if windowExpressions is empty in column pruning

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36842: URL: https://github.com/apache/spark/pull/36842#issuecomment-1153340903 cc @hvanhovell FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on pull request #36845: [SPARK-39447][SQL] Only non-broadcast query stage can propagate empty relation

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36845: URL: https://github.com/apache/spark/pull/36845#issuecomment-1153340424 cc @maryannxue FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36844: Update ExecutorClassLoader.scala

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36844: URL: https://github.com/apache/spark/pull/36844#discussion_r895257421 ## repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala: ## @@ -54,7 +54,7 @@ class ExecutorClassLoader( classUri: String, parent: ClassLoad

[GitHub] [spark] wangyum commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list

2022-06-12 Thread GitBox
wangyum commented on code in PR #36847: URL: https://github.com/apache/spark/pull/36847#discussion_r895254709 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManager)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36847: URL: https://github.com/apache/spark/pull/36847#discussion_r895254287 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManage

[GitHub] [spark] github-actions[bot] commented on pull request #35256: [SPARK-37933][SQL] Limit push down for parquet vectorized reader

2022-06-12 Thread GitBox
github-actions[bot] commented on PR #35256: URL: https://github.com/apache/spark/pull/35256#issuecomment-1153332516 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-06-12 Thread GitBox
github-actions[bot] commented on PR #35719: URL: https://github.com/apache/spark/pull/35719#issuecomment-1153332508 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #36848: [SPARK-39449][SQL] Propagate empty relation through Window

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36848: URL: https://github.com/apache/spark/pull/36848#issuecomment-1153331778 cc @hvanhovell FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36850: [SPARK-39069][SQL] Pushing EqualTo with Literal to other conditions

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36850: URL: https://github.com/apache/spark/pull/36850#discussion_r895252969 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1501,19 +1501,42 @@ object EliminateSorts extends Rule[LogicalPlan] {

[GitHub] [spark] HyukjinKwon closed pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
HyukjinKwon closed pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame URL: https://github.com/apache/spark/pull/36793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] HyukjinKwon commented on pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36793: URL: https://github.com/apache/spark/pull/36793#issuecomment-1153328439 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
HyukjinKwon commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r895251697 ## python/pyspark/sql/session.py: ## @@ -952,12 +953,29 @@ def createDataFrame( # type: ignore[misc] schema = [x.encode("utf-8") if not isinstance(x, s

[GitHub] [spark] HyukjinKwon commented on pull request #36840: [SPARK-39443][PYTHON][DOC] Improve docstring of pyspark.sql.functions.col/first

2022-06-12 Thread GitBox
HyukjinKwon commented on PR #36840: URL: https://github.com/apache/spark/pull/36840#issuecomment-1153327788 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #36840: [SPARK-39443][PYTHON][DOC] Improve docstring of pyspark.sql.functions.col/first

2022-06-12 Thread GitBox
HyukjinKwon closed pull request #36840: [SPARK-39443][PYTHON][DOC] Improve docstring of pyspark.sql.functions.col/first URL: https://github.com/apache/spark/pull/36840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] gengliangwang commented on a diff in pull request #36771: [SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ADD COLUMNS to V2 data sources

2022-06-12 Thread GitBox
gengliangwang commented on code in PR #36771: URL: https://github.com/apache/spark/pull/36771#discussion_r895248971 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -43,6 +43,8 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] gengliangwang commented on a diff in pull request #36771: [SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ADD COLUMNS to V2 data sources

2022-06-12 Thread GitBox
gengliangwang commented on code in PR #36771: URL: https://github.com/apache/spark/pull/36771#discussion_r895248853 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -43,6 +43,8 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] xinrong-databricks commented on pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
xinrong-databricks commented on PR #36793: URL: https://github.com/apache/spark/pull/36793#issuecomment-1153244595 Rebased for conflicts in `python/docs/source/getting_started/install.rst` only. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
xinrong-databricks commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r895210608 ## python/pyspark/sql/session.py: ## @@ -952,12 +953,29 @@ def createDataFrame( # type: ignore[misc] schema = [x.encode("utf-8") if not isinstan

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
xinrong-databricks commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r895210329 ## python/pyspark/sql/session.py: ## @@ -952,12 +953,29 @@ def createDataFrame( # type: ignore[misc] schema = [x.encode("utf-8") if not isinstan

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
xinrong-databricks commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r895208448 ## python/pyspark/sql/session.py: ## @@ -952,12 +953,29 @@ def createDataFrame( # type: ignore[misc] schema = [x.encode("utf-8") if not isinstan

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36840: [SPARK-39443][PYTHON][DOC] Improve docstring of pyspark.sql.functions.col/first

2022-06-12 Thread GitBox
xinrong-databricks commented on code in PR #36840: URL: https://github.com/apache/spark/pull/36840#discussion_r895207134 ## python/pyspark/sql/functions.py: ## @@ -1240,6 +1241,17 @@ def first(col: "ColumnOrName", ignorenulls: bool = False) -> Column: - The functi

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-12 Thread GitBox
xinrong-databricks commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r895206894 ## python/pyspark/sql/session.py: ## @@ -952,12 +953,29 @@ def createDataFrame( # type: ignore[misc] schema = [x.encode("utf-8") if not isinstan

[GitHub] [spark] AmplabJenkins commented on pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
AmplabJenkins commented on PR #36852: URL: https://github.com/apache/spark/pull/36852#issuecomment-1153230519 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
MaxGekk commented on PR #36811: URL: https://github.com/apache/spark/pull/36811#issuecomment-1153206595 > What about exact numeric for fractional seconds? I will implement this separately. This PR leverages existing functionality of non-ANSI mode. -- This is an automated message f

[GitHub] [spark] panbingkun commented on a diff in pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
panbingkun commented on code in PR #36676: URL: https://github.com/apache/spark/pull/36676#discussion_r895178269 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -61,13 +77,25 @@ private[spark] object SparkThrowableHelper { queryContext: String = ""): String

[GitHub] [spark] panbingkun closed pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
panbingkun closed pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode URL: https://github.com/apache/spark/pull/36676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] panbingkun commented on pull request #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
panbingkun commented on PR #36852: URL: https://github.com/apache/spark/pull/36852#issuecomment-1153175452 FYI old pr: https://github.com/apache/spark/pull/36676, i have closed it! master branch pr: https://github.com/apache/spark/pull/36350 ping @MaxGekk -- This is an automated

[GitHub] [spark] panbingkun opened a new pull request, #36852: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
panbingkun opened a new pull request, #36852: URL: https://github.com/apache/spark/pull/36852 ### What changes were proposed in this pull request? Migrate the following errors in QueryExecutionErrors: * unsupportedSaveModeError -> UNSUPPORTED_SAVE_MODE ### Why are the change

[GitHub] [spark] wangyum commented on pull request #36851: [WIP] Make `SchemaPruning` only pruning if it contains nested column

2022-06-12 Thread GitBox
wangyum commented on PR #36851: URL: https://github.com/apache/spark/pull/36851#issuecomment-1153170221 cc @Yaohua628 @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum opened a new pull request, #36851: [WIP] Make `SchemaPruning` only pruning if it contains nested column

2022-06-12 Thread GitBox
wangyum opened a new pull request, #36851: URL: https://github.com/apache/spark/pull/36851 ### What changes were proposed in this pull request? This PR makes `SchemaPruning` only pruning when `HadoopFsRelation`'s schema contains nested column. ### Why are the changes needed?

[GitHub] [spark] srielau commented on pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
srielau commented on PR #36811: URL: https://github.com/apache/spark/pull/36811#issuecomment-1153161847 What about exact numeric for fractional seconds? We keep coming back to that…. CAST(INTERVAL ‚12345.678‘ SECOND As DECIMAL(8,3)) -- This is an automated message from the Apache Git

[GitHub] [spark] MaxGekk commented on pull request #36811: [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

2022-06-12 Thread GitBox
MaxGekk commented on PR #36811: URL: https://github.com/apache/spark/pull/36811#issuecomment-1153120819 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] MaxGekk commented on a diff in pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-06-12 Thread GitBox
MaxGekk commented on code in PR #36676: URL: https://github.com/apache/spark/pull/36676#discussion_r895141657 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -61,13 +77,25 @@ private[spark] object SparkThrowableHelper { queryContext: String = ""): String =

[GitHub] [spark] MaxGekk closed pull request #36818: [SPARK-39259][SQL][3.1] Evaluate timestamps consistently in subqueries

2022-06-12 Thread GitBox
MaxGekk closed pull request #36818: [SPARK-39259][SQL][3.1] Evaluate timestamps consistently in subqueries URL: https://github.com/apache/spark/pull/36818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] MaxGekk commented on pull request #36818: [SPARK-39259][SQL][3.1] Evaluate timestamps consistently in subqueries

2022-06-12 Thread GitBox
MaxGekk commented on PR #36818: URL: https://github.com/apache/spark/pull/36818#issuecomment-1153109133 +1, LGTM. Merging to 3.1. Thank you, @olaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] wangyum commented on pull request #36709: [SPARK-39325][CORE]Improve MapOutputTracker convertMapStatuses performance

2022-06-12 Thread GitBox
wangyum commented on PR #36709: URL: https://github.com/apache/spark/pull/36709#issuecomment-1153101611 @dongjoon-hyun I don't think this is a regression since all these changes are for push-based shuffles. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [spark] wangyum opened a new pull request, #36850: [SPARK-39069][SQL] Pushing EqualTo with Literal to other conditions

2022-06-12 Thread GitBox
wangyum opened a new pull request, #36850: URL: https://github.com/apache/spark/pull/36850 ### What changes were proposed in this pull request? This PR enhances `PruneFilters` to push `EqualTo` with `Literal` to other conditions. For example: ```sql CREATE TABLE t1 ( id D

[GitHub] [spark] cxzl25 commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

2022-06-12 Thread GitBox
cxzl25 commented on code in PR #36769: URL: https://github.com/apache/spark/pull/36769#discussion_r895128848 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ## @@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQueryTest w

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

2022-06-12 Thread GitBox
dongjoon-hyun commented on code in PR #36769: URL: https://github.com/apache/spark/pull/36769#discussion_r895128188 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ## @@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQuer

[GitHub] [spark] cxzl25 commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

2022-06-12 Thread GitBox
cxzl25 commented on code in PR #36769: URL: https://github.com/apache/spark/pull/36769#discussion_r895125836 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ## @@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQueryTest w