[GitHub] [spark] zsxwing commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-07 Thread GitBox
zsxwing commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1043035473 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java: ## @@ -93,4 +93,18 @@ default Transform[]

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1043030519 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1043028737 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-07 Thread GitBox
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1043027754 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { //

[GitHub] [spark] zhouyifan279 opened a new pull request, #38978: [SPARK-39948] Exclude hive-vector-code-gen dependency to solve CVE-2020-13936

2022-12-07 Thread GitBox
zhouyifan279 opened a new pull request, #38978: URL: https://github.com/apache/spark/pull/38978 ### What changes were proposed in this pull request? Remove hive-vector-code-gen and its dependent jars from spark distribution ### Why are the changes needed? hive-vector-code-gen is

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-07 Thread GitBox
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1043022539 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java: ## @@ -93,4 +93,18 @@ default Transform[]

[GitHub] [spark] gengliangwang commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-07 Thread GitBox
gengliangwang commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1342212090 @yabola ok, can you try adding a new test case for it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] wankunde commented on a diff in pull request #38649: [SPARK-41132][SQL] Convert LikeAny and NotLikeAny to InSet if no pattern contains wildcards

2022-12-07 Thread GitBox
wankunde commented on code in PR #38649: URL: https://github.com/apache/spark/pull/38649#discussion_r1043018066 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -780,6 +780,13 @@ object LikeSimplification extends Rule[LogicalPlan]

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1043015456 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-07 Thread GitBox
HyukjinKwon commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1043011644 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -21,6 +21,7 @@ import grpc # type: ignore +from pyspark.sql.connect.column import Column

[GitHub] [spark] dongjoon-hyun commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342193784 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342193584 > BTW, I plan to revisit all these stuff to clean up all - we rushed a bit in Spark Connect Python client. For now, LGTM. I am going to merge this. Ack. No problem at all,

[GitHub] [spark] HyukjinKwon commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
HyukjinKwon commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342193439 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
HyukjinKwon closed pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available URL: https://github.com/apache/spark/pull/38976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
HyukjinKwon commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342193041 BTW, I plan to revisit all these stuff to clean up all - we rushed a bit in Spark Connect Python client. For now, LGTM. I am going to merge this. -- This is an automated message

[GitHub] [spark] mridulm commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-07 Thread GitBox
mridulm commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1043005698 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { //

[GitHub] [spark] MaxGekk commented on pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-07 Thread GitBox
MaxGekk commented on PR #38864: URL: https://github.com/apache/spark/pull/38864#issuecomment-1342185477 @xkrogen After offline discussion with @cloud-fan @srielau, we decided to change the parameter marker to `:` in the PR. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r104358 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1042997408 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] dongjoon-hyun commented on pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1342175409 Could you make a backporting PR, @pan3793 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1042993786 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1042993398 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -733,7 +735,7 @@ private[spark] class ExecutorAllocationManager( // If this

[GitHub] [spark] dongjoon-hyun commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342167092 Thank you, @grundprinzip . Could you review this, @viirya ? This will recover one of Apache Spark Apple Silicon CI on `Scaleway`. -- This is an automated message from the

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1042987205 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] HeartSaVioR closed pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-07 Thread GitBox
HeartSaVioR closed pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit URL: https://github.com/apache/spark/pull/38880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-07 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1042987003 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

2022-12-07 Thread GitBox
ulysses-you commented on code in PR #38939: URL: https://github.com/apache/spark/pull/38939#discussion_r1042986658 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -145,6 +145,12 @@ case class

[GitHub] [spark] HeartSaVioR commented on pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-07 Thread GitBox
HeartSaVioR commented on PR #38880: URL: https://github.com/apache/spark/pull/38880#issuecomment-1342158178 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

2022-12-07 Thread GitBox
ulysses-you commented on code in PR #38939: URL: https://github.com/apache/spark/pull/38939#discussion_r1042983373 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -785,7 +785,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] amaliujia commented on pull request #38977: [SPARK-41445][CONNECT] Implement DataFrameReader.parquet

2022-12-07 Thread GitBox
amaliujia commented on PR #38977: URL: https://github.com/apache/spark/pull/38977#issuecomment-1342148963 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon opened a new pull request, #38977: [SPARK-41445][CONNECT] Implement DataFrameReader.parquet

2022-12-07 Thread GitBox
HyukjinKwon opened a new pull request, #38977: URL: https://github.com/apache/spark/pull/38977 ### What changes were proposed in this pull request? This PR implements `DataFrameReader.parquet` alias in Spark Connect. ### Why are the changes needed? For API feature

[GitHub] [spark] zhengruifeng commented on pull request #38975: [SPARK-41444][CONNECT] Support read.json()

2022-12-07 Thread GitBox
zhengruifeng commented on PR #38975: URL: https://github.com/apache/spark/pull/38975#issuecomment-1342129557 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wecharyu commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
wecharyu commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1342127570 @HeartSaVioR yes you are right, the actual problem is that we may fetch empty partitions unexpectedly in one batch, and in the next batch we fetch the real partitions again. The new

[GitHub] [spark] dongjoon-hyun commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342119648 All tests passed. ![Screenshot 2022-12-07 at 10 15 48 PM](https://user-images.githubusercontent.com/9700541/206371936-d2e0eacd-d67c-4989-8afd-520800e7bf5f.png) -- This is

[GitHub] [spark] HyukjinKwon closed pull request #38975: [SPARK-41284][CONNECT] Support read.json()

2022-12-07 Thread GitBox
HyukjinKwon closed pull request #38975: [SPARK-41284][CONNECT] Support read.json() URL: https://github.com/apache/spark/pull/38975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38975: [SPARK-41284][CONNECT] Support read.json()

2022-12-07 Thread GitBox
HyukjinKwon commented on PR #38975: URL: https://github.com/apache/spark/pull/38975#issuecomment-1342110895 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38975: [SPARK-41284][CONNECT] Support read.json()

2022-12-07 Thread GitBox
HyukjinKwon commented on code in PR #38975: URL: https://github.com/apache/spark/pull/38975#discussion_r1042962157 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -121,6 +121,23 @@ def test_simple_read(self): # Check that the limit is applied

[GitHub] [spark] gengliangwang commented on pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-07 Thread GitBox
gengliangwang commented on PR #38776: URL: https://github.com/apache/spark/pull/38776#issuecomment-1342094831 LGTM except minor comments. Great work, @anchovYu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1042955098 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -649,10 +649,10 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] viirya commented on pull request #38969: [SPARK-41442][SQL] Only update SQLMetric value if merging with valid metric

2022-12-07 Thread GitBox
viirya commented on PR #38969: URL: https://github.com/apache/spark/pull/38969#issuecomment-1342094246 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1042955098 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -649,10 +649,10 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] dongjoon-hyun commented on pull request #38969: [SPARK-41442][SQL] Only update SQLMetric value if merging with valid metric

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38969: URL: https://github.com/apache/spark/pull/38969#issuecomment-1342093390 All tests passed. Merged to master. Thank you, @viirya and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun closed pull request #38969: [SPARK-41442][SQL] Only update SQLMetric value if merging with valid metric

2022-12-07 Thread GitBox
dongjoon-hyun closed pull request #38969: [SPARK-41442][SQL] Only update SQLMetric value if merging with valid metric URL: https://github.com/apache/spark/pull/38969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-07 Thread GitBox
gengliangwang commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1042953641 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] grundprinzip commented on a diff in pull request #38970: [SPARK-41412][CONNECT] Implement `Column.cast`

2022-12-07 Thread GitBox
grundprinzip commented on code in PR #38970: URL: https://github.com/apache/spark/pull/38970#discussion_r1042952381 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -518,9 +518,16 @@ class

[GitHub] [spark] gengliangwang commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-07 Thread GitBox
gengliangwang commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1042951934 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache

[GitHub] [spark] gengliangwang commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-07 Thread GitBox
gengliangwang commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1042951497 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache

[GitHub] [spark] grundprinzip commented on a diff in pull request #38970: [SPARK-41412][CONNECT] Implement `Column.cast`

2022-12-07 Thread GitBox
grundprinzip commented on code in PR #38970: URL: https://github.com/apache/spark/pull/38970#discussion_r1042950939 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -43,7 +43,11 @@ message Expression { Expression expr = 1; //

[GitHub] [spark] LuciferYang commented on pull request #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-07 Thread GitBox
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1342081928 https://github.com/LuciferYang/make-distribution.sh/actions/runs/3645502265/jobs/6155706406

[GitHub] [spark] LuciferYang commented on a diff in pull request #38960: [SPARK-41435][SQL] Change to call `invalidFunctionArgumentsError` for `curdate()` when `expressions` is not empty

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38960: URL: https://github.com/apache/spark/pull/38960#discussion_r1042948569 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -172,7 +172,7 @@ object CurDateExpressionBuilder extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38970: [SPARK-41412][CONNECT] Implement `Column.cast`

2022-12-07 Thread GitBox
amaliujia commented on code in PR #38970: URL: https://github.com/apache/spark/pull/38970#discussion_r1042945624 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -43,7 +43,11 @@ message Expression { Expression expr = 1; //

[GitHub] [spark] LuciferYang commented on pull request #38936: [SPARK-41408][BUILD] Upgrade scala-maven-plugin to 4.8.0

2022-12-07 Thread GitBox
LuciferYang commented on PR #38936: URL: https://github.com/apache/spark/pull/38936#issuecomment-1342060398 @steveloughran maven build spark with hadoop 3.4.0-SNAPSHOT failed not related to `scala-maven-plugin`, after add the following 2 test dependencies to `sql/core` as

[GitHub] [spark] LuciferYang commented on pull request #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-07 Thread GitBox
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1342053776 Downgrading scala-maven-plugin will reach 4.7.2, and the local maven build will still pass. I will update pr and test the GA compilation ``` [INFO] Reactor Summary for Spark

[GitHub] [spark] sharkdtu commented on a diff in pull request #38518: [SPARK-33349][K8S] Reset the executor pods watcher when we receive a version changed from k8s

2022-12-07 Thread GitBox
sharkdtu commented on code in PR #38518: URL: https://github.com/apache/spark/pull/38518#discussion_r1042926171 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala: ## @@ -78,6 +80,15 @@ class

[GitHub] [spark] amaliujia commented on pull request #38975: [SPARK-41284][CONNECT] Support read.json()

2022-12-07 Thread GitBox
amaliujia commented on PR #38975: URL: https://github.com/apache/spark/pull/38975#issuecomment-1342036106 @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zsxwing commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-07 Thread GitBox
zsxwing commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1042918249 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java: ## @@ -93,4 +93,18 @@ default Transform[]

[GitHub] [spark] dongjoon-hyun commented on pull request #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import `Column` if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38976: URL: https://github.com/apache/spark/pull/38976#issuecomment-1342008681 cc @grundprinzip , @hvanhovell , @HyukjinKwon , @cloud-fan , @zhengruifeng , @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HeartSaVioR commented on pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-07 Thread GitBox
HeartSaVioR commented on PR #38911: URL: https://github.com/apache/spark/pull/38911#issuecomment-1342007657 cc. @zsxwing @xuanyuanking @viirya Friendly reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-07 Thread GitBox
HeartSaVioR commented on PR #38880: URL: https://github.com/apache/spark/pull/38880#issuecomment-1342007531 cc. @zsxwing @xuanyuanking @viirya Friendly reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-07 Thread GitBox
dongjoon-hyun commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1042916333 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -21,6 +21,7 @@ import grpc # type: ignore +from pyspark.sql.connect.column import Column

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-07 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1042916237 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38976: [SPARK-41366][CONNECT][FOLLOWUP] Import Column if pandas is available

2022-12-07 Thread GitBox
dongjoon-hyun opened a new pull request, #38976: URL: https://github.com/apache/spark/pull/38976 ### What changes were proposed in this pull request? This is a follow-up to move `Column` import statement in order to a test issue ### Why are the changes needed? `Column`

[GitHub] [spark] LuciferYang commented on pull request #38936: [SPARK-41408][BUILD] Upgrade scala-maven-plugin to 4.8.0

2022-12-07 Thread GitBox
LuciferYang commented on PR #38936: URL: https://github.com/apache/spark/pull/38936#issuecomment-1341995325 @HyukjinKwon @srowen @steveloughran In order to not block this pr, I open another one to investigate the issue mentioned in the dev list : maven build spark master with hadoop

[GitHub] [spark] LuciferYang commented on pull request #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-07 Thread GitBox
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1341987449 Run ``` build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -DskipTests

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
HeartSaVioR commented on code in PR #38898: URL: https://github.com/apache/spark/pull/38898#discussion_r1042911844 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala: ## @@ -627,6 +627,45 @@ abstract class

[GitHub] [spark] dongjoon-hyun commented on pull request #38862: [SPARK-41350][SQL] Allow simple name access of join hidden columns after subquery alias

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38862: URL: https://github.com/apache/spark/pull/38862#issuecomment-1341980674 Thank you for backporting this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
HeartSaVioR commented on code in PR #38898: URL: https://github.com/apache/spark/pull/38898#discussion_r1042910210 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala: ## @@ -627,6 +627,45 @@ abstract class

[GitHub] [spark] HeartSaVioR commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
HeartSaVioR commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1341979338 I'm trying to understand the case - if my understanding is correct, the new test is just to trigger the same behavior rather than reproducing actual problem, right? In the new test,

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-07 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1042909362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-07 Thread GitBox
beliefer commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1042908731 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimit.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
amaliujia commented on PR #38973: URL: https://github.com/apache/spark/pull/38973#issuecomment-1341969449 Looks pretty good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
amaliujia commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042905152 ## python/pyspark/sql/connect/dataframe.py: ## @@ -824,6 +824,39 @@ def withColumn(self, colName: str, col: Column) -> "DataFrame":

[GitHub] [spark] HeartSaVioR commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
HeartSaVioR commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1341968071 Could you please try to summarize the description of JIRA to PR template, especially the part of "Root Cause"? Also, is it "known" issue for Kafka consumer? Also please note

[GitHub] [spark] amaliujia commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
amaliujia commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042904415 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -309,6 +310,28 @@ class SparkConnectPlanner(session:

[GitHub] [spark] amaliujia commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
amaliujia commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042904146 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -570,3 +571,21 @@ message Hint { // (Optional) Hint parameters. repeated

[GitHub] [spark] amaliujia opened a new pull request, #38975: [SPARK-41284][CONNECT] Support read.json()

2022-12-07 Thread GitBox
amaliujia opened a new pull request, #38975: URL: https://github.com/apache/spark/pull/38975 ### What changes were proposed in this pull request? This PR supports the `json()` API in DataFrameReader. This API is built on top of the core API of the reader (schema, load,

[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

2022-12-07 Thread GitBox
ulysses-you commented on code in PR #38939: URL: https://github.com/apache/spark/pull/38939#discussion_r1042862929 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala: ## @@ -223,6 +224,19 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with

[GitHub] [spark] LuciferYang commented on pull request #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-07 Thread GitBox
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1341954845 https://github.com/LuciferYang/make-distribution.sh/blob/master/.github/workflows/blank.yml https://github.com/LuciferYang/make-distribution.sh/actions/runs/3645098350 --

[GitHub] [spark] LuciferYang opened a new pull request, #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-07 Thread GitBox
LuciferYang opened a new pull request, #38974: URL: https://github.com/apache/spark/pull/38974 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] pan3793 commented on pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-07 Thread GitBox
pan3793 commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1341948947 Thanks @dongjoon-hyun for checking, I think you are right, it should be ported to 3.2/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-07 Thread GitBox
dongjoon-hyun commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1341947314 Thank you all. If this is caused by SPARK-27991 at Spark 3.2.0, do we need to backport to branch-3.3 and 3.2? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] wecharyu commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-07 Thread GitBox
wecharyu commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1341944936 @jerrypeng the empty offset will be stored in `committedOffsets`, when we run next batch, the following code will record an empty map startOffset in `newBatchesPlan`:

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-07 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1042892403 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala: ## @@ -234,6 +235,108 @@ abstract class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not empty

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38960: URL: https://github.com/apache/spark/pull/38960#discussion_r1042889334 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -172,7 +172,7 @@ object CurDateExpressionBuilder extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not empty

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38960: URL: https://github.com/apache/spark/pull/38960#discussion_r1042889334 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -172,7 +172,7 @@ object CurDateExpressionBuilder extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38969: [SPARK-41442][SQL] Only update SQLMetric value if merging with valid metric

2022-12-07 Thread GitBox
dongjoon-hyun commented on code in PR #38969: URL: https://github.com/apache/spark/pull/38969#discussion_r1042885426 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2156,8 +2156,15 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-07 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1042879224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -649,10 +649,10 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] beliefer commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
beliefer commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042865399 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -570,3 +571,21 @@ message Hint { // (Optional) Hint parameters. repeated

[GitHub] [spark] LuciferYang commented on pull request #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not empty

2022-12-07 Thread GitBox
LuciferYang commented on PR #38960: URL: https://github.com/apache/spark/pull/38960#issuecomment-1341907393 There are two questions: 1. When `validParametersCount.length == 0`, should an internal exception be thrown? I think `validParametersCount.length == 0` should only be because it

[GitHub] [spark] beliefer commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
beliefer commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042866959 ## python/pyspark/sql/connect/dataframe.py: ## @@ -824,6 +824,39 @@ def withColumn(self, colName: str, col: Column) -> "DataFrame":

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
zhengruifeng commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042865704 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -570,3 +571,21 @@ message Hint { // (Optional) Hint parameters. repeated

[GitHub] [spark] zhengruifeng commented on pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
zhengruifeng commented on PR #38973: URL: https://github.com/apache/spark/pull/38973#issuecomment-1341900821 awesome! @beliefer thank you so much -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beliefer commented on a diff in pull request #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
beliefer commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1042865399 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -570,3 +571,21 @@ message Hint { // (Optional) Hint parameters. repeated

[GitHub] [spark] itholic closed pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-12-07 Thread GitBox
itholic closed pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042` URL: https://github.com/apache/spark/pull/38664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] itholic commented on pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-12-07 Thread GitBox
itholic commented on PR #38664: URL: https://github.com/apache/spark/pull/38664#issuecomment-1341896519 Closing since it's duplicated to #38707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-12-07 Thread GitBox
itholic commented on PR #38664: URL: https://github.com/apache/spark/pull/38664#issuecomment-1341896379 Closing since it's duplicated to #38707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

2022-12-07 Thread GitBox
ulysses-you commented on code in PR #38939: URL: https://github.com/apache/spark/pull/38939#discussion_r1042862929 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala: ## @@ -223,6 +224,19 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with

[GitHub] [spark] beliefer opened a new pull request, #38973: [WIP][SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt`

2022-12-07 Thread GitBox
beliefer opened a new pull request, #38973: URL: https://github.com/apache/spark/pull/38973 ### What changes were proposed in this pull request? Implement `DataFrame.melt` with a proto message 1. Implement `DataFrame.melt` for scala API 2. Implement `DataFrame.melt`for python

[GitHub] [spark] panbingkun opened a new pull request, #38972: [SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1061

2022-12-07 Thread GitBox
panbingkun opened a new pull request, #38972: URL: https://github.com/apache/spark/pull/38972 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-07 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1042860224 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

  1   2   3   >