[GitHub] [spark] LuciferYang commented on pull request #38075: [WIP][SPARK-40633][BUILD] Upgrade janino to 3.1.8

2022-11-20 Thread GitBox
LuciferYang commented on PR #38075: URL: https://github.com/apache/spark/pull/38075#issuecomment-1321605889 will check 3.1.9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-20 Thread GitBox
cloud-fan commented on code in PR #38722: URL: https://github.com/apache/spark/pull/38722#discussion_r1027662257 ## core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java: ## @@ -812,9 +812,7 @@ public boolean append(Object kbase, long koff, int klen, Object vbase,

[GitHub] [spark] mridulm commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-20 Thread GitBox
mridulm commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1321576092 Thanks for the PR to fix this bug ! I will need to think more about this issue, but I am leaning towards a variant of the solution proposed. Namely: * `stageAttemptToNumSpecul

[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-20 Thread GitBox
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1027656667 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala: ## @@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with TestHiveSingleton with Befo

[GitHub] [spark] cloud-fan commented on a diff in pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-20 Thread GitBox
cloud-fan commented on code in PR #38703: URL: https://github.com/apache/spark/pull/38703#discussion_r1027651034 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -355,7 +355,7 @@ case class ListQuery( plan.canonicalized,

[GitHub] [spark] cloud-fan commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-20 Thread GitBox
cloud-fan commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1321552064 Oh wait a minute. Due to the spark catalog extension (set via `spark.sql.catalog.spark_catalog`), we can have tables supporting time travel in the v1 catalog as well. I think `SessionCa

[GitHub] [spark] mridulm commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-20 Thread GitBox
mridulm commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321551358 @yabola, there is quite a lot of nontrivial overlap between this PR and @wankunde's PR is trying doing. Would be great if you both can coordinate on this - I would love to get this fun

[GitHub] [spark] cloud-fan commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-20 Thread GitBox
cloud-fan commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1321550782 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] mridulm commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-20 Thread GitBox
mridulm commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321548579 > One things that I know need to be addressed are: > Some merge data infos are not saved on the driver because they are too small ( controlled by spark.shuffle.push.minShuffleSizeToWait

[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-20 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1027640551 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -769,7 +772,14 @@ private[spark] object AppStatusStore { def createLiveStore(

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-11-20 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1027637860 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -393,6 +394,35 @@ public void applicationRemoved(String app

[GitHub] [spark] LuciferYang opened a new pull request, #38737: [SPARK-41174][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-20 Thread GitBox
LuciferYang opened a new pull request, #38737: URL: https://github.com/apache/spark/pull/38737 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] erenavsarogullari commented on pull request #38736: [WIP][SPARK-41214][SQL] - SubPlan metrics are missed when AQE is enabled under InMemoryRelation

2022-11-20 Thread GitBox
erenavsarogullari commented on PR #38736: URL: https://github.com/apache/spark/pull/38736#issuecomment-1321545696 cc @cloud-fan @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] erenavsarogullari opened a new pull request, #38736: [WIP][SPARK-41214][SQL] - SubPlan metrics under InMemoryRelation are missed when …

2022-11-20 Thread GitBox
erenavsarogullari opened a new pull request, #38736: URL: https://github.com/apache/spark/pull/38736 ### What changes were proposed in this pull request? `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` enables AQE optimizations under `InMemoryRelation`(IMR) nodes b

[GitHub] [spark] cloud-fan commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-20 Thread GitBox
cloud-fan commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1027630891 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1263,60 +1263,71 @@ class AstBuilder extends SqlBaseParserBaseVisitor[An

[GitHub] [spark] dongjoon-hyun commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox
dongjoon-hyun commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1321531047 Thank you, @mridulm ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1321529913 I was on two minds whether to fix this in 3.3 as well ... Yes, 3.3 is affected by it. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] LuciferYang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-20 Thread GitBox
LuciferYang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1027625306 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def createKVSt

[GitHub] [spark] mridulm commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-20 Thread GitBox
mridulm commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1027618122 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -769,7 +772,14 @@ private[spark] object AppStatusStore { def createLiveStore( conf:

[GitHub] [spark] dongjoon-hyun commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox
dongjoon-hyun commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1321516325 Thank you, @gaoyajun02 , @mridulm , @otterc . - Do we need to backport this to branch-3.3? - According to the previous failure description, what happens in branch-3.3 in case o

[GitHub] [spark] liuzqt commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollec

2022-11-20 Thread GitBox
liuzqt commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1027614714 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with SharedSpa

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1321511612 Merged to master. Thanks for fixing this @gaoyajun02 ! Thanks for the review @otterc :-) -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] asfgit closed pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox
asfgit closed pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size URL: https://github.com/apache/spark/pull/38333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] mridulm commented on pull request #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-20 Thread GitBox
mridulm commented on PR #38467: URL: https://github.com/apache/spark/pull/38467#issuecomment-1321508245 Agree with @Ngone51, there are two issues here. a) When we have locked for read/write, we expect it to be unlocked and exceptions to be handled gracefully. In this case, `removeB

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027605394 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027594911 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends Q

[GitHub] [spark] cloud-fan closed pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-20 Thread GitBox
cloud-fan closed pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching URL: https://github.com/apache/spark/pull/38692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] cloud-fan commented on pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-20 Thread GitBox
cloud-fan commented on PR #38692: URL: https://github.com/apache/spark/pull/38692#issuecomment-1321487076 thanks for the review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] sadikovi commented on pull request #38731: [SPARK-41209][PYSPARK] Improve PySpark type inference in _merge_type method

2022-11-20 Thread GitBox
sadikovi commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1321483028 @HyukjinKwon I noticed that NullType in PySpark is on the list of atomic types which it is not, in fact, it is mentioned in the type's doc string. However, I tried to remove it but encou

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38735: URL: https://github.com/apache/spark/pull/38735#discussion_r1027583933 ## python/pyspark/sql/connect/dataframe.py: ## @@ -115,6 +115,9 @@ def __init__( self._cache: Dict[str, Any] = {} self._session: "RemoteSparkSess

[GitHub] [spark] zhengruifeng opened a new pull request, #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-20 Thread GitBox
zhengruifeng opened a new pull request, #38735: URL: https://github.com/apache/spark/pull/38735 ### What changes were proposed in this pull request? Implement `DataFrame.__repr__` and `DataFrame.dtypes` ### Why are the changes needed? For api coverage ### Does this

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027573603 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027573603 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027573603 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-20 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1027559033 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -119,6 +126,12 @@ class ExecutorPodsAlloca

[GitHub] [spark] zhengruifeng opened a new pull request, #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-20 Thread GitBox
zhengruifeng opened a new pull request, #38734: URL: https://github.com/apache/spark/pull/38734 ### What changes were proposed in this pull request? Implement `DataFrame.isEmpty` ### Why are the changes needed? API Coverage ### Does this PR introduce _any_ user-fac

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027545869 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends Q

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027524453 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027524453 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-20 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1027519725 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -723,6 +723,25 @@ private[spark] object Config extends Logging {

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-20 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1027519725 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -723,6 +723,25 @@ private[spark] object Config extends Logging {

[GitHub] [spark] yaooqinn opened a new pull request, #38733: [SPARK-41211][Core] Upgrade ZooKeeper from 3.6.2 to 3.6.3

2022-11-20 Thread GitBox
yaooqinn opened a new pull request, #38733: URL: https://github.com/apache/spark/pull/38733 ### What changes were proposed in this pull request? Upgrade ZooKeeper to 3.6.3 ### Why are the changes needed? ZooKeeper 3.6.3 contains many bugfixes, such as

[GitHub] [spark] zhengruifeng commented on pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on PR #38686: URL: https://github.com/apache/spark/pull/38686#issuecomment-1321403447 I will update this PR after the arrow-based collect is fixed https://github.com/apache/spark/pull/38706 , otherwise, some e2e tests will fail -- This is an automated message from t

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027509435 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -600,7 +600,7 @@ class FileMetadataStructSuite extends Qu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1027508681 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -203,6 +204,19 @@ message Sort { } } + +// Drop specified columns. +message Drop

[GitHub] [spark] pan3793 opened a new pull request, #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-20 Thread GitBox
pan3793 opened a new pull request, #38732: URL: https://github.com/apache/spark/pull/38732 ### What changes were proposed in this pull request? Fail Spark Application when executor failures reach threshold. ### Why are the changes needed? Sometimes, executor c

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1027507821 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -523,6 +524,19 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1027506855 ## python/pyspark/sql/connect/dataframe.py: ## @@ -255,10 +255,21 @@ def distinct(self) -> "DataFrame": ) def drop(self, *cols: "ColumnOrString") -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1027505888 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -148,6 +148,23 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-20 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1027505120 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -203,6 +204,19 @@ message Sort { } } + +// Drop specified columns. +message Drop

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027504882 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends Q

[GitHub] [spark] wankunde commented on pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-11-20 Thread GitBox
wankunde commented on PR #37922: URL: https://github.com/apache/spark/pull/37922#issuecomment-1321392369 Hi, @mridulm , I've been working on some other issues recently. If @yabola can do all or part of this task in https://github.com/apache/spark/pull/38560, please go ahead. -- This is

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027504573 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -275,8 +275,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027503203 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -464,11 +464,13 @@ object FileSourceMetadataAttribute {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-20 Thread GitBox
dongjoon-hyun commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1027500899 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends

[GitHub] [spark] sadikovi commented on pull request #38731: [SPARK-41209] Improve PySpark type inference in _merge_type method

2022-11-20 Thread GitBox
sadikovi commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1321374183 @HyukjinKwon @xinrong-meng Could you review this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] sadikovi opened a new pull request, #38731: [SPARK-41209] Improve PySpark type inference in _merge_type method

2022-11-20 Thread GitBox
sadikovi opened a new pull request, #38731: URL: https://github.com/apache/spark/pull/38731 ### What changes were proposed in this pull request? This PR updates `_merge_type` method to allow upcast from any `AtomicType` to `StringType` similar to Cast.scala (https://githu

[GitHub] [spark] panbingkun opened a new pull request, #38730: [SPARK-41181][SQL] Migrate the map options errors onto error classes

2022-11-20 Thread GitBox
panbingkun opened a new pull request, #38730: URL: https://github.com/apache/spark/pull/38730 ### What changes were proposed in this pull request? The pr aims to migrate the map options errors onto error classes. ### Why are the changes needed? The changes improve the error frame

[GitHub] [spark] mcdull-zhang commented on pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-20 Thread GitBox
mcdull-zhang commented on PR #38703: URL: https://github.com/apache/spark/pull/38703#issuecomment-1321371779 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng opened a new pull request, #38729: [CONNECT][INFRA] Update protobuf versions in CI

2022-11-20 Thread GitBox
zhengruifeng opened a new pull request, #38729: URL: https://github.com/apache/spark/pull/38729 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

[GitHub] [spark] LuciferYang commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultC

2022-11-20 Thread GitBox
LuciferYang commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1027465227 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with Shar

[GitHub] [spark] zhengruifeng commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-20 Thread GitBox
zhengruifeng commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1321369830 it seems that the versions in `build_and_test` are not updated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-20 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1027410164 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends CastSuiteBa

[GitHub] [spark] itholic commented on a diff in pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-20 Thread GitBox
itholic commented on code in PR #38728: URL: https://github.com/apache/spark/pull/38728#discussion_r1027408511 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -45,12 +46,18 @@ import org.apache.spark.util.Utils final cas

[GitHub] [spark] itholic commented on a diff in pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-20 Thread GitBox
itholic commented on code in PR #38728: URL: https://github.com/apache/spark/pull/38728#discussion_r1027408367 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -45,12 +46,18 @@ import org.apache.spark.util.Utils final cas

[GitHub] [spark] zhengruifeng commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-20 Thread GitBox
zhengruifeng commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1321307266 late lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] panbingkun commented on a diff in pull request #38725: [SPARK-41182][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1102

2022-11-20 Thread GitBox
panbingkun commented on code in PR #38725: URL: https://github.com/apache/spark/pull/38725#discussion_r1027386994 ## core/src/main/resources/error/error-classes.json: ## @@ -1326,6 +1326,11 @@ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup"

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-20 Thread GitBox
HyukjinKwon commented on code in PR #38728: URL: https://github.com/apache/spark/pull/38728#discussion_r1027385407 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -45,12 +46,18 @@ import org.apache.spark.util.Utils final

[GitHub] [spark] HyukjinKwon closed pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-20 Thread GitBox
HyukjinKwon closed pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client. URL: https://github.com/apache/spark/pull/38726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-20 Thread GitBox
HyukjinKwon commented on PR #38726: URL: https://github.com/apache/spark/pull/38726#issuecomment-1321296180 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] github-actions[bot] closed pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem

2022-11-20 Thread GitBox
github-actions[bot] closed pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem URL: https://github.com/apache/spark/pull/36070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] commented on pull request #36443: [POC][WIP][SPARK-39088][CORE] Add a "live" driver link to the UI for history server when serving in-progress applications.

2022-11-20 Thread GitBox
github-actions[bot] commented on PR #36443: URL: https://github.com/apache/spark/pull/36443#issuecomment-1321290211 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36908: [SPARK-39510][SQL][WIP] Leverage the natural partitioning and ordering of MonotonicallyIncreasingID

2022-11-20 Thread GitBox
github-actions[bot] closed pull request #36908: [SPARK-39510][SQL][WIP] Leverage the natural partitioning and ordering of MonotonicallyIncreasingID URL: https://github.com/apache/spark/pull/36908 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] huaxingao commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-20 Thread GitBox
huaxingao commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1321287120 LGTM. Thanks for fixing this! @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HuwCampbell commented on a diff in pull request #36441: [SPARK-39091][SQL] Updating specific SQL Expression traits that don't compose when multiple are extended due to nodePatterns be

2022-11-20 Thread GitBox
HuwCampbell commented on code in PR #36441: URL: https://github.com/apache/spark/pull/36441#discussion_r973814236 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala: ## @@ -839,6 +841,52 @@ class HigherOrderFunctionsSuite ext

[GitHub] [spark] AmplabJenkins commented on pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38714: URL: https://github.com/apache/spark/pull/38714#issuecomment-1321252021 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HeartSaVioR closed pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR closed pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source URL: https://github.com/apache/spark/pull/38717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR commented on pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR commented on PR #38717: URL: https://github.com/apache/spark/pull/38717#issuecomment-1321250486 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR commented on PR #38717: URL: https://github.com/apache/spark/pull/38717#issuecomment-1321250199 Thanks for understanding. Let's go with no risk fix for now, and have more time to think about the holistic fix. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] srielau commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-20 Thread GitBox
srielau commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1321215985 > > But is COLUMN_ALREADY_EXISTS the best choice for CREATE TABLE or WITH cte(c1, c1) AS? > > How about AS T(c1, c1) > > @srielau I assumed that we will provide a query context w

[GitHub] [spark] MaxGekk commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-20 Thread GitBox
MaxGekk commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1321196320 > But is COLUMN_ALREADY_EXISTS the best choice for CREATE TABLE or WITH cte(c1, c1) AS? > How about AS T(c1, c1) @srielau I assumed that we will provide a query context which sho

[GitHub] [spark] srielau commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-20 Thread GitBox
srielau commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1321188144 I've some doubts about COLUMN_ALREADY_EXISTS when a column is duplicated within a new list. I.e. it makes a lot of sense for ALTER TABLE ADD COLUMN. But is COLUMN_ALREADY_EXISTS the

[GitHub] [spark] AmplabJenkins commented on pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38722: URL: https://github.com/apache/spark/pull/38722#issuecomment-1321183990 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38723: URL: https://github.com/apache/spark/pull/38723#issuecomment-1321183964 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38725: [SPARK-41182][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1102

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38725: URL: https://github.com/apache/spark/pull/38725#issuecomment-1321183950 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38726: URL: https://github.com/apache/spark/pull/38726#issuecomment-1321183941 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on a diff in pull request #38725: [SPARK-41182][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1102

2022-11-20 Thread GitBox
MaxGekk commented on code in PR #38725: URL: https://github.com/apache/spark/pull/38725#discussion_r1027319785 ## core/src/main/resources/error/error-classes.json: ## @@ -1326,6 +1326,11 @@ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup" ]

[GitHub] [spark] MaxGekk commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-20 Thread GitBox
MaxGekk commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1321178694 @srielau @LuciferYang @panbingkun @itholic @cloud-fan Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] AmplabJenkins commented on pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-20 Thread GitBox
AmplabJenkins commented on PR #38728: URL: https://github.com/apache/spark/pull/38728#issuecomment-1321155240 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip opened a new pull request, #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-20 Thread GitBox
grundprinzip opened a new pull request, #38728: URL: https://github.com/apache/spark/pull/38728 ### What changes were proposed in this pull request? Migrate existing custom exceptions in Spark Connect to use the proper Spark exceptions. ### Why are the changes needed? Consistenc

[GitHub] [spark] grundprinzip commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-20 Thread GitBox
grundprinzip commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1027288685 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class SQLExp

[GitHub] [spark] grundprinzip commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-20 Thread GitBox
grundprinzip commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1027288544 ## python/pyspark/sql/connect/dataframe.py: ## @@ -124,6 +125,29 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat def sele

[GitHub] [spark] grundprinzip commented on a diff in pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-20 Thread GitBox
grundprinzip commented on code in PR #38726: URL: https://github.com/apache/spark/pull/38726#discussion_r1027276398 ## python/pyspark/sql/connect/dataframe.py: ## @@ -756,6 +757,62 @@ def schema(self) -> StructType: else: return self._schema +def tran

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38717: URL: https://github.com/apache/spark/pull/38717#discussion_r1027251716 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -341,7 +355,13 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] HyukjinKwon closed pull request #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` configuration file for `protobuf` module testing

2022-11-20 Thread GitBox
HyukjinKwon closed pull request #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` configuration file for `protobuf` module testing URL: https://github.com/apache/spark/pull/38708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` configuration file for `protobuf` module testing

2022-11-20 Thread GitBox
HyukjinKwon commented on PR #38708: URL: https://github.com/apache/spark/pull/38708#issuecomment-1321075055 Thanks @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] HyukjinKwon commented on pull request #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` configuration file for `protobuf` module testing

2022-11-20 Thread GitBox
HyukjinKwon commented on PR #38708: URL: https://github.com/apache/spark/pull/38708#issuecomment-1321075019 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38717: URL: https://github.com/apache/spark/pull/38717#discussion_r1027251960 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -341,7 +355,13 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-20 Thread GitBox
HyukjinKwon commented on code in PR #38726: URL: https://github.com/apache/spark/pull/38726#discussion_r1027251817 ## python/pyspark/sql/connect/dataframe.py: ## @@ -756,6 +757,62 @@ def schema(self) -> StructType: else: return self._schema +def trans

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-20 Thread GitBox
HeartSaVioR commented on code in PR #38717: URL: https://github.com/apache/spark/pull/38717#discussion_r1027251716 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -341,7 +355,13 @@ trait ProgressReporter extends Logging {

  1   2   >