[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-08 Thread GitBox
xinrong-databricks commented on code in PR #36793: URL: https://github.com/apache/spark/pull/36793#discussion_r892863327 ## python/pyspark/sql/tests/test_session.py: ## @@ -379,6 +381,54 @@ def test_use_custom_class_for_extensions(self): ) +class

[GitHub] [spark] Borjianamin98 commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
Borjianamin98 commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1150401390 > @Borjianamin98 Do you have a jira account? I tried to assign the jira to you but can't find you. My username in jira is `borjianamin` like what I created for issue

[GitHub] [spark] HyukjinKwon commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1150399434 I read e.g., https://lists.apache.org/thread/tcjh5wlthg21j519tl7o25cdo81792vr vs. https://github.com/apache/spark/pull/25607#issuecomment-525745116 Using other branches is

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
huaxingao commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1150390322 @Borjianamin98 Do you have a jira account? I tried to assign the jira to you but can't find you. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
huaxingao commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1150387353 Merged to master/3.3/3.2/3.1. Thanks @Borjianamin98 for your first contribution and welcome to Spark community! Also thanks @LuciferYang @dcoliversun for reviewing! -- This is

[GitHub] [spark] huaxingao closed pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
huaxingao closed pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types URL: https://github.com/apache/spark/pull/36781 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] huaxingao commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox
huaxingao commented on PR #36810: URL: https://github.com/apache/spark/pull/36810#issuecomment-1150378282 LGTM. Pending test results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-06-08 Thread GitBox
mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r892802648 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1068,25 +1086,61 @@ private[spark] class TaskSetManager( * Check if the task

[GitHub] [spark] MaxGekk opened a new pull request, #36811: [WIP][SQL] Support casting intervals to integrals in ANSI mode

2022-06-08 Thread GitBox
MaxGekk opened a new pull request, #36811: URL: https://github.com/apache/spark/pull/36811 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] mridulm commented on pull request #36709: [SPARK-39325][CORE]Improve MapOutputTracker convertMapStatuses performance

2022-06-08 Thread GitBox
mridulm commented on PR #36709: URL: https://github.com/apache/spark/pull/36709#issuecomment-1150310283 +CC @Ngone51 as well, since you had reviewed the original change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #36807: [WIP][SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox
dongjoon-hyun commented on PR #36807: URL: https://github.com/apache/spark/pull/36807#issuecomment-1150278895 Thank you always for your proactive contribution, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-08 Thread GitBox
MaxGekk commented on PR #36753: URL: https://github.com/apache/spark/pull/36753#issuecomment-1150249981 > 3.1 already does not have any transform with subqueries function ... Could you list required PRs, please. Is it possible to extract only needed functions from them? -- This is

[GitHub] [spark] MaxGekk closed pull request #36804: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

2022-06-08 Thread GitBox
MaxGekk closed pull request #36804: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors URL: https://github.com/apache/spark/pull/36804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #36804: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

2022-06-08 Thread GitBox
MaxGekk commented on PR #36804: URL: https://github.com/apache/spark/pull/36804#issuecomment-1150243540 Merging to master/3.3. Thank you, @HeartSaVioR @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] olaky commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-08 Thread GitBox
olaky commented on PR #36753: URL: https://github.com/apache/spark/pull/36753#issuecomment-1150242302 @MaxGekk 3.1 already does not have any transform with subqueries function, so I would have to backport this as well. I personally feel that this could be a risky endeavour not worth doing

[GitHub] [spark] srielau commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox
srielau commented on code in PR #36693: URL: https://github.com/apache/spark/pull/36693#discussion_r892641206 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -73,18 +73,20 @@ private[spark] object SparkThrowableHelper { def getMessage( errorClass:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892631764 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] singhpk234 commented on pull request #36810: SPARK-39417: Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox
singhpk234 commented on PR #36810: URL: https://github.com/apache/spark/pull/36810#issuecomment-1150161063 cc @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] singhpk234 opened a new pull request, #36810: SPARK-39417: Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox
singhpk234 opened a new pull request, #36810: URL: https://github.com/apache/spark/pull/36810 ### What changes were proposed in this pull request? We should not try casting everything returned by `removeLeadingZerosFromNumberTypePartition` to string, as it returns null value

[GitHub] [spark] cloud-fan opened a new pull request, #36809: [WIP] Simplify the handling of TempResolvedColumn

2022-06-08 Thread GitBox
cloud-fan opened a new pull request, #36809: URL: https://github.com/apache/spark/pull/36809 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892098440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892098440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892593552 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -736,6 +737,24 @@ abstract class TypeCoercionBase { } } +

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892575789 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] park OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r892566204 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -461,7 +467,8 @@ class SparkContext(config: SparkConf) extends Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] park OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r892558108 ## core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala: ## @@ -155,9 +159,9 @@ private[spark] class OutputCommitCoordinator(conf: SparkConf,

[GitHub] [spark] cloud-fan commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36693: URL: https://github.com/apache/spark/pull/36693#discussion_r892541582 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -73,236 +101,340 @@ private[spark] case class ExecutorDeadException(message: String) */

[GitHub] [spark] cloud-fan commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36693: URL: https://github.com/apache/spark/pull/36693#discussion_r892535272 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -73,18 +73,20 @@ private[spark] object SparkThrowableHelper { def getMessage( errorClass:

[GitHub] [spark] AmplabJenkins commented on pull request #36806: [SPARK-39398][GRAPHX]message checkpointer support storage level

2022-06-08 Thread GitBox
AmplabJenkins commented on PR #36806: URL: https://github.com/apache/spark/pull/36806#issuecomment-1150061039 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] park OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r892521784 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2588,6 +2588,14 @@ private[spark] class DAGScheduler( runningStages -= stage }

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] park OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r892518640 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2588,6 +2588,14 @@ private[spark] class DAGScheduler( runningStages -= stage }

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] park OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r892511172 ## core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala: ## @@ -76,6 +76,8 @@ object SparkHadoopMapRedUtil extends Logging { if

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-08 Thread GitBox
AngersZh commented on code in PR #36786: URL: https://github.com/apache/spark/pull/36786#discussion_r892496313 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -140,7 +143,10 @@ private[hive] object

[GitHub] [spark] wangyum commented on a diff in pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-08 Thread GitBox
wangyum commented on code in PR #36786: URL: https://github.com/apache/spark/pull/36786#discussion_r892493780 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -140,7 +143,10 @@ private[hive] object SparkSQLCLIDriver

[GitHub] [spark] huaxingao closed pull request #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread GitBox
huaxingao closed pull request #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite URL: https://github.com/apache/spark/pull/36805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] huaxingao commented on pull request #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread GitBox
huaxingao commented on PR #36805: URL: https://github.com/apache/spark/pull/36805#issuecomment-1150004547 Merged to master. Thanks @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892459971 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -736,6 +737,24 @@ abstract class TypeCoercionBase { } } +

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892455937 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] cloud-fan commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
cloud-fan commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1149985883 > We always use master branch to release, no? No, we use branch-3.3 to release 3.3.x -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892098440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892428863 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] MaxGekk commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-08 Thread GitBox
MaxGekk commented on PR #36753: URL: https://github.com/apache/spark/pull/36753#issuecomment-1149949544 @olaky This PR has been merged to branch-3.2 already, see https://github.com/apache/spark/commit/d611d1f66761bd39fee850ca3f435027f9fc1e3c Please, open separate PRs against

[GitHub] [spark] cxzl25 opened a new pull request, #36808: [SPARK-39415][CORE] Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread GitBox
cxzl25 opened a new pull request, #36808: URL: https://github.com/apache/spark/pull/36808 ### What changes were proposed in this pull request? Start `HadoopDelegationTokenManager` when `LocalSchedulerBackend` starts. The behavior is similar to `CoarseGrainedSchedulerBackend` startup,

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892376424 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -736,6 +737,24 @@ abstract class TypeCoercionBase { } } +

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892374424 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -736,6 +737,24 @@ abstract class TypeCoercionBase { } } +

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892338094 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -736,6 +737,24 @@ abstract class TypeCoercionBase { } } +

[GitHub] [spark] Ngone51 commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

2022-06-08 Thread GitBox
Ngone51 commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r892317333 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,46 +935,56 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] Ngone51 commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

2022-06-08 Thread GitBox
Ngone51 commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r892314812 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,46 +935,56 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] Ngone51 commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

2022-06-08 Thread GitBox
Ngone51 commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r892313257 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,46 +935,56 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] LuciferYang commented on pull request #36807: [WIP][SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox
LuciferYang commented on PR #36807: URL: https://github.com/apache/spark/pull/36807#issuecomment-1149862059 wait silencer upgrade -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] olaky commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-08 Thread GitBox
olaky commented on PR #36753: URL: https://github.com/apache/spark/pull/36753#issuecomment-1149853529 @MaxGekk since you closed this, should I still work on propagating this to 3.1 and 3.0? And how should we deal with the test failures happening on branch-3.2? -- This is an automated

[GitHub] [spark] LuciferYang commented on pull request #36807: [WIP][SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox
LuciferYang commented on PR #36807: URL: https://github.com/apache/spark/pull/36807#issuecomment-1149850412 For test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang opened a new pull request, #36807: [SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox
LuciferYang opened a new pull request, #36807: URL: https://github.com/apache/spark/pull/36807 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] MaxGekk commented on a diff in pull request #35715: [SPARK-37753][SQL] Fine tune logic to demote Broadcast hash join in DynamicJoinSelection

2022-06-08 Thread GitBox
MaxGekk commented on code in PR #35715: URL: https://github.com/apache/spark/pull/35715#discussion_r892247619 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -683,6 +683,41 @@ class AdaptiveQueryExecSuite } }

[GitHub] [spark] wayneguow commented on pull request #36775: [SPARK-39389]Filesystem closed should not be considered as corrupt files

2022-06-08 Thread GitBox
wayneguow commented on PR #36775: URL: https://github.com/apache/spark/pull/36775#issuecomment-1149679199 +1 to @JoshRosen 's issue. We also encountered this problem in our scenario for ETL. When reading abnormally from filesystem(such as read timeout exception, which may succeeded

[GitHub] [spark] HyukjinKwon commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1149651712 let me backport just for sure in any event. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1149650782 We always use master branch to release, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892101500 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -806,6 +825,7 @@ abstract class TypeCoercionBase { object TypeCoercion

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-08 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r892098440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1227,6 +1227,49 @@ case class Pivot( override

[GitHub] [spark] wwli05 opened a new pull request, #36806: [SPARK-39398][GRAPHX]message checkpointer support storage level

2022-06-08 Thread GitBox
wwli05 opened a new pull request, #36806: URL: https://github.com/apache/spark/pull/36806 ### What changes were proposed in this pull request? 1. in spark-30502, it already support PeriodicRDDCheckpointer set the checkpoint storage level , now in pregel, messageCheckpointer also

[GitHub] [spark] HeartSaVioR closed pull request #36801: [SPARK-39404][SS] Minor fix for querying `_metadata` in streaming

2022-06-08 Thread GitBox
HeartSaVioR closed pull request #36801: [SPARK-39404][SS] Minor fix for querying `_metadata` in streaming URL: https://github.com/apache/spark/pull/36801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #36801: [SPARK-39404][SS] Minor fix for querying `_metadata` in streaming

2022-06-08 Thread GitBox
HeartSaVioR commented on PR #36801: URL: https://github.com/apache/spark/pull/36801#issuecomment-1149630507 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
cloud-fan commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1149629636 shall we merge to 3.3 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon closed pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py URL: https://github.com/apache/spark/pull/36803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36803: URL: https://github.com/apache/spark/pull/36803#issuecomment-1149604323 Merged to master. The tests don't run this code path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] beliefer commented on pull request #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread GitBox
beliefer commented on PR #36805: URL: https://github.com/apache/spark/pull/36805#issuecomment-1149603804 ping @huaxingao cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-08 Thread GitBox
HyukjinKwon closed pull request #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2' URL: https://github.com/apache/spark/pull/36802 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36802: URL: https://github.com/apache/spark/pull/36802#issuecomment-1149603537 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer opened a new pull request, #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread GitBox
beliefer opened a new pull request, #36805: URL: https://github.com/apache/spark/pull/36805 ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`.

[GitHub] [spark] MaxGekk opened a new pull request, #36804: [WIP][SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

2022-06-08 Thread GitBox
MaxGekk opened a new pull request, #36804: URL: https://github.com/apache/spark/pull/36804 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-08 Thread GitBox
HeartSaVioR commented on code in PR #36704: URL: https://github.com/apache/spark/pull/36704#discussion_r891998689 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala: ## @@ -666,9 +667,10 @@ abstract class

[GitHub] [spark] HyukjinKwon closed pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-08 Thread GitBox
HyukjinKwon closed pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties URL: https://github.com/apache/spark/pull/36799 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-08 Thread GitBox
HyukjinKwon commented on PR #36799: URL: https://github.com/apache/spark/pull/36799#issuecomment-1149535232 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon opened a new pull request, #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread GitBox
HyukjinKwon opened a new pull request, #36803: URL: https://github.com/apache/spark/pull/36803 ### What changes were proposed in this pull request? This PR proposes to address type hints `__version__: str` correctly in each release. The type hint was added from Spark 3.3.0 at

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
huaxingao commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1149525380 I think over. I think it's better to have a separate PR to fix the explain problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] MaxGekk closed pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-08 Thread GitBox
MaxGekk closed pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints URL: https://github.com/apache/spark/pull/36792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-08 Thread GitBox
MaxGekk commented on PR #36792: URL: https://github.com/apache/spark/pull/36792#issuecomment-1149521109 +1, LGTM. Merging to 3.3. Thank you, @vli-databricks and @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread GitBox
huaxingao commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1149518253 The fix looks good but the explain result bothers me. Here is what I got from the explain result: ``` spark.read.parquet(dir.getCanonicalPath).filter("isnotnull(f)").explain(true)

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-08 Thread GitBox
cxzl25 commented on code in PR #36787: URL: https://github.com/apache/spark/pull/36787#discussion_r891950939 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ## @@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-08 Thread GitBox
cxzl25 commented on code in PR #36787: URL: https://github.com/apache/spark/pull/36787#discussion_r891950939 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala: ## @@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest

<    1   2