[GitHub] [spark] gengliangwang opened a new pull request, #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ

2022-12-08 Thread GitBox
gengliangwang opened a new pull request, #38997: URL: https://github.com/apache/spark/pull/38997 ### What changes were proposed in this pull request? Handle TimestampNTZ in `Cast.canUpCast`: * Date and timestamp type can up cast to TimestampNTZ. * TimestampNTZ can up cast

[GitHub] [spark] ahmed-mahran opened a new pull request, #38996: [SPARK-41008][MLLIB] Follow-up isotonic regression features deduplica…

2022-12-08 Thread GitBox
ahmed-mahran opened a new pull request, #38996: URL: https://github.com/apache/spark/pull/38996 ### What changes were proposed in this pull request? A follow-up on https://github.com/apache/spark/pull/38966 to update relevant documentation and remove redundant sort key.

[GitHub] [spark] ulysses-you commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-08 Thread GitBox
ulysses-you commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1044175147 ## core/pom.xml: ## @@ -616,6 +621,48 @@ + +org.apache.maven.plugins +maven-shade-plugin Review Comment: @ge

[GitHub] [spark] Ngone51 commented on pull request #38995: [SPARK-41460][CORE] Introduce IsolatedThreadSafeRpcEndpoint to extend IsolatedRpcEndpoint

2022-12-08 Thread GitBox
Ngone51 commented on PR #38995: URL: https://github.com/apache/spark/pull/38995#issuecomment-1343967442 > What about renaming IsolatedRpcEndpoint to IsolatedThreadSafeRpcEndpoint simply? I think this would breach the original design by the author argued at https://github.com/apache/s

[GitHub] [spark] HyukjinKwon commented on pull request #38967: [SPARK-41369][CONNECT][FOLLOWUP] Remove unneeded connect server deps

2022-12-08 Thread GitBox
HyukjinKwon commented on PR #38967: URL: https://github.com/apache/spark/pull/38967#issuecomment-1343959972 Oops, this is apparently used. When I run the commands below: ```bash ./build/mvn -Phive -DskipTests clean package ./python/run-tests --module pyspark-connect -p 1 ```

[GitHub] [spark] LuciferYang commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1044163985 ## connector/connect/common/pom.xml: ## @@ -0,0 +1,225 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; +

[GitHub] [spark] shuyouZZ commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db

2022-12-08 Thread GitBox
shuyouZZ commented on code in PR #38983: URL: https://github.com/apache/spark/pull/38983#discussion_r1044159282 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -996,6 +996,21 @@ private[history] class FsHistoryProvider(conf: SparkConf, cloc

[GitHub] [spark] LuciferYang commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1044157198 ## connector/connect/common/pom.xml: ## @@ -0,0 +1,225 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; +

[GitHub] [spark] LuciferYang commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1044157198 ## connector/connect/common/pom.xml: ## @@ -0,0 +1,225 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; +

[GitHub] [spark] LuciferYang commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1044157198 ## connector/connect/common/pom.xml: ## @@ -0,0 +1,225 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; +

[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-08 Thread GitBox
HyukjinKwon commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1343943663 I prefer to merge https://github.com/apache/spark/pull/38991 first but please don't bother. I don't mind resolving conflicts 👍 -- This is an automated message from the Apache Git S

[GitHub] [spark] Ngone51 opened a new pull request, #38995: [SPARK-41460][CORE] Introduce IsolatedThreadSafeRpcEndpoint to extend IsolatedRpcEndpoint

2022-12-08 Thread GitBox
Ngone51 opened a new pull request, #38995: URL: https://github.com/apache/spark/pull/38995 ### What changes were proposed in this pull request? This PR introduces a new layer `IsolatedThreadSafeRpcEndpoint` to extend `IsolatedRpcEndpoint` and changes all the endpoints whic

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-08 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1044148832 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1178,8 +1178,13 @@ private[spark] class DAGScheduler( listenerBus.post(SparkListene

[GitHub] [spark] dongjoon-hyun commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-08 Thread GitBox
dongjoon-hyun commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1343935891 Thank you, @Yikun . It seems that your test works, but linter job failed at SparkR issue again. - https://github.com/Yikun/spark/actions/runs/3655117402/jobs/6176152494 -- This

[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-08 Thread GitBox
HyukjinKwon commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1343931366 cc @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] HyukjinKwon opened a new pull request, #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-08 Thread GitBox
HyukjinKwon opened a new pull request, #38994: URL: https://github.com/apache/spark/pull/38994 ### What changes were proposed in this pull request? This PR proposes to resolve the circular imports workarounds ### Why are the changes needed? For better readability and

[GitHub] [spark] idealspark opened a new pull request, #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-08 Thread GitBox
idealspark opened a new pull request, #38993: URL: https://github.com/apache/spark/pull/38993 ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Gi

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-08 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1044143098 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,92 @@ case class ArrayExcept(left: Expressio

[GitHub] [spark] HyukjinKwon commented on pull request #38992: [MINOR][CONNECT][DOCS] Document parallelism=1 in Spark Connect testing

2022-12-08 Thread GitBox
HyukjinKwon commented on PR #38992: URL: https://github.com/apache/spark/pull/38992#issuecomment-1343924920 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon opened a new pull request, #38992: [MINOR][CONNECT][DOCS] Document parallelism=1 in Spark Connect testing

2022-12-08 Thread GitBox
HyukjinKwon opened a new pull request, #38992: URL: https://github.com/apache/spark/pull/38992 ### What changes were proposed in this pull request? This PR proposes to document the correct way of running Spark Connect tests with `--parallelism 1` option in `./python/run-tests` script.

[GitHub] [spark] Yikun commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-08 Thread GitBox
Yikun commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1343923114 For the pyspark failare, let's see https://github.com/Yikun/spark/pull/193/commits/3840beb42877335efd3bc6089c99bce5287b3079 works or not: https://github.com/Yikun/spark/actions/runs/365511

[GitHub] [spark] dongjoon-hyun commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-08 Thread GitBox
dongjoon-hyun commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1343920950 As I mentioned [here](https://github.com/apache/spark/pull/38982#issuecomment-1343437210), it's irrelevant to this PR and a known issue. You can ignore that, @pan3793 . -- This i

[GitHub] [spark] dongjoon-hyun commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests

2022-12-08 Thread GitBox
dongjoon-hyun commented on PR #38991: URL: https://github.com/apache/spark/pull/38991#issuecomment-1343920074 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] HyukjinKwon commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor pandas, pyarrow and grpc check in tests

2022-12-08 Thread GitBox
HyukjinKwon commented on PR #38991: URL: https://github.com/apache/spark/pull/38991#issuecomment-1343919107 cc @grundprinzip @hvanhovell @dongjoon-hyun @amaliujia @zhengruifeng @xinrong-meng FYI -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon opened a new pull request, #38991: [SPARK-41457][PYTHON][TESTS] Refactor pandas, pyarrow and grpc check in tests

2022-12-08 Thread GitBox
HyukjinKwon opened a new pull request, #38991: URL: https://github.com/apache/spark/pull/38991 ### What changes were proposed in this pull request? This PR proposes to: - Print out the correct error message when dependencies are not installed for `pyspark.sql.connect` - Igno

[GitHub] [spark] jiaoqingbo opened a new pull request, #38990: [MINOR][DOC] Fix typo in SqlBaseLexer.g4

2022-12-08 Thread GitBox
jiaoqingbo opened a new pull request, #38990: URL: https://github.com/apache/spark/pull/38990 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] mridulm commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-08 Thread GitBox
mridulm commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1044131362 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { // TOD

[GitHub] [spark] mridulm commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-08 Thread GitBox
mridulm commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1044130961 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { // TOD

[GitHub] [spark] shenjiayu17 commented on pull request #38534: [SPARK-38505][SQL] Make partial aggregation adaptive

2022-12-08 Thread GitBox
shenjiayu17 commented on PR #38534: URL: https://github.com/apache/spark/pull/38534#issuecomment-1343908559 Hi @wangyum. I'm very interested in this optimization on partial aggregation. But why does it need these child node limit? Do they make some influence on function or performance? `

[GitHub] [spark] dengziming commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-08 Thread GitBox
dengziming commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044127806 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session:

[GitHub] [spark] amaliujia commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-08 Thread GitBox
amaliujia commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044125168 ## python/pyspark/sql/connect/proto_converter.py: ## @@ -0,0 +1,62 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] amaliujia commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-08 Thread GitBox
amaliujia commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044124554 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session:

[GitHub] [spark] boneanxs commented on pull request #38980: [SPARK-41448] Make consistent MR job IDs in FileBatchWriter and FileFormatWriter

2022-12-08 Thread GitBox
boneanxs commented on PR #38980: URL: https://github.com/apache/spark/pull/38980#issuecomment-1343903807 @cloud-fan @rdblue @dongjoon-hyun @steveloughran could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on a diff in pull request #38979: [SPARK-41446][CONNECT][PYTHON] Make `createDataFrame` support schema and more input dataset types

2022-12-08 Thread GitBox
amaliujia commented on code in PR #38979: URL: https://github.com/apache/spark/pull/38979#discussion_r1044122515 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -304,6 +305,24 @@ message LocalRelation { // Local collection data serialized in

[GitHub] [spark] pan3793 commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-08 Thread GitBox
pan3793 commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1343879880 @dongjoon-hyun the pyspark and lint PR fail consistently, and I see there were also failed on previous commits. Sorry I'm not familiar w/ Python, cc @zhengruifeng @Yikun would you please

[GitHub] [spark] pan3793 commented on pull request #38989: [SPARK-41458][BUILD][YARN][SHUFFLE] Correctly transform the SPI services for Yarn Shuffle Service

2022-12-08 Thread GitBox
pan3793 commented on PR #38989: URL: https://github.com/apache/spark/pull/38989#issuecomment-1343878019 cc @srowen @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] pan3793 opened a new pull request, #38989: [SPARK-41458][BUILD][YARN][SHUFFLE] Correctly transform the SPI services for Yarn Shuffle Service

2022-12-08 Thread GitBox
pan3793 opened a new pull request, #38989: URL: https://github.com/apache/spark/pull/38989 ### What changes were proposed in this pull request? Correctly transform the SPI services for Yarn Shuffle Service by configuring `ServicesResourceTransformer`. ### Why are the ch

[GitHub] [spark] gengliangwang commented on a diff in pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast

2022-12-08 Thread GitBox
gengliangwang commented on code in PR #38988: URL: https://github.com/apache/spark/pull/38988#discussion_r1044100198 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1280,6 +1280,16 @@ case class Cast( } } + // Whether Spark

[GitHub] [spark] zhengruifeng commented on pull request #38979: [SPARK-41446][CONNECT][PYTHON] Make `createDataFrame` support schema and more input dataset types

2022-12-08 Thread GitBox
zhengruifeng commented on PR #38979: URL: https://github.com/apache/spark/pull/38979#issuecomment-1343866751 @HyukjinKwon @cloud-fan @amaliujia @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang opened a new pull request, #38988: [SPARK-41456][SQLimprove try_cast

2022-12-08 Thread GitBox
gengliangwang opened a new pull request, #38988: URL: https://github.com/apache/spark/pull/38988 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[GitHub] [spark] LuciferYang commented on pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-08 Thread GitBox
LuciferYang commented on PR #38954: URL: https://github.com/apache/spark/pull/38954#issuecomment-1343860910 friendly ping @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044083377 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] yabola commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-08 Thread GitBox
yabola commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1343836630 @gengliangwang I add new UT and change decode to carefully decode each parameter, I think it aligns with the previous behavior and is more accurate ( I reuse [decodeURLParameter](https://

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044079922 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( va

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-08 Thread GitBox
HeartSaVioR commented on code in PR #38517: URL: https://github.com/apache/spark/pull/38517#discussion_r1044068977 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala: ## @@ -0,0 +1,282 @@ +/* + * Licensed to the Apa

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-08 Thread GitBox
HeartSaVioR commented on code in PR #38517: URL: https://github.com/apache/spark/pull/38517#discussion_r1044075012 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala: ## @@ -0,0 +1,1865 @@ +/* + * Licensed to t

[GitHub] [spark] Ngone51 commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1044071110 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite with

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
beliefer commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343813646 OK. Another way. `ArrayCompact` can reuse `ArrayFilter` and implement `RuntimeReplaceable`. ``` > SELECT filter(array(1, 2, 3, null), x -> x IS NOT NULL); [1,2,3]

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1044069338 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extends Sp

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-08 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1044068904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1761,6 +1763,114 @@ class Analyzer(override val catalogManager: CatalogM

[GitHub] [spark] Ngone51 commented on pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-08 Thread GitBox
Ngone51 commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1343811023 > Btw, do you also want to remove the if (event.taskInfo == null) { check in beginning of onTaskEnd ? @mridulm Since the latest PR fix doesn't involve the metrics, I think we can

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044066697 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044064987 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044065935 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044065444 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044064987 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] Ngone51 commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1044064944 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite with

[GitHub] [spark] Ngone51 commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1044064601 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -689,7 +689,15 @@ private[spark] class AppStatusListener( if (metricsDelta != null)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38979: [SPARK-41446][CONNECT][PYTHON] Make `createDataFrame` support schema and more input dataset types

2022-12-08 Thread GitBox
zhengruifeng commented on code in PR #38979: URL: https://github.com/apache/spark/pull/38979#discussion_r1044063886 ## python/pyspark/sql/connect/plan.py: ## @@ -167,21 +169,38 @@ def _repr_html_(self) -> str: class LocalRelation(LogicalPlan): -"""Creates a LocalRelatio

[GitHub] [spark] zhengruifeng commented on pull request #38979: [SPARK-41446][CONNECT][PYTHON] Make `createDataFrame` support schema and more input dataset types

2022-12-08 Thread GitBox
zhengruifeng commented on PR #38979: URL: https://github.com/apache/spark/pull/38979#issuecomment-1343806000 difference in casting: this PR leverages `Dataset.to(schema)` to cast datatypes, which is very different from the pyspark's approach which relies on [the `_acceptable_types` list]

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1044062231 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1178,8 +1178,13 @@ private[spark] class DAGScheduler( listenerBus.post(SparkListenerTa

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1044062231 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1178,8 +1178,13 @@ private[spark] class DAGScheduler( listenerBus.post(SparkListenerTa

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-08 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1044060202 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the TaskSetManage

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1044056418 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extends Sp

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044056150 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( va

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-08 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-08 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051385 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-08 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051062 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +58,69 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from -3)

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
sandeep-katta commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343778634 > Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR. > > `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceab

[GitHub] [spark] LuciferYang commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1044047853 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extend

[GitHub] [spark] pan3793 commented on pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on PR #38985: URL: https://github.com/apache/spark/pull/38985#issuecomment-1343774719 > It seems this is to fix a bad case caused by user use way? It right. > Is it necessary for Spark to do fault tolerance? The change is small, I think it's valuable.

[GitHub] [spark] shuyouZZ commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db

2022-12-08 Thread GitBox
shuyouZZ commented on code in PR #38983: URL: https://github.com/apache/spark/pull/38983#discussion_r1044044735 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -996,6 +996,21 @@ private[history] class FsHistoryProvider(conf: SparkConf, cloc

[GitHub] [spark] srielau commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-12-08 Thread GitBox
srielau commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1044041728 ## core/src/main/resources/error/error-classes.json: ## @@ -1443,6 +1443,11 @@ "A correlated outer name reference within a subquery expression body was not

[GitHub] [spark] wankunde commented on a diff in pull request #38649: [SPARK-41132][SQL] Convert LikeAny and NotLikeAny to InSet if no pattern contains wildcards

2022-12-08 Thread GitBox
wankunde commented on code in PR #38649: URL: https://github.com/apache/spark/pull/38649#discussion_r1044038097 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -762,10 +762,40 @@ object LikeSimplification extends Rule[LogicalPlan]

[GitHub] [spark] LuciferYang commented on pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
LuciferYang commented on PR #38985: URL: https://github.com/apache/spark/pull/38985#issuecomment-1343768447 It seems this is to fix a bad case caused by user use way? The current `lang3` version used by Spark does not trigger this issue, right? I don't know how many similar bad cases will b

[GitHub] [spark] Yikun commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
Yikun commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1044003100 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extends Spar

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343752740 > Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR. > > `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceable

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1044002198 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,57 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044001778 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( va

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-08 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1044000749 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,92 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] LuciferYang commented on pull request #38974: [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT

2022-12-08 Thread GitBox
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1343747647 This pr can fix the compile issue in the dev mail list reported by @steveloughran, but should we wait until Hadoop 3.4 is upgraded?. What do you think? @HyukjinKwon @dongjoon-hyun @s

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38973: [SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt` and `DataFrame.unpivot`

2022-12-08 Thread GitBox
zhengruifeng commented on code in PR #38973: URL: https://github.com/apache/spark/pull/38973#discussion_r1043999011 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -309,6 +310,24 @@ class SparkConnectPlanner(sessio

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
beliefer commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343734796 > @beliefer would you mind also help reviewing? Thanks Thank you for you ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] pan3793 commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-08 Thread GitBox
pan3793 commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1343734626 Re-triggered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
beliefer commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343731612 Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR. `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceable`. ```

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1043983985 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -275,7 +275,7 @@ private[spark] object KubernetesConf {

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1043983005 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extends Sp

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1043983985 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -275,7 +275,7 @@ private[spark] object KubernetesConf {

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043982706 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] pan3793 commented on a diff in pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-08 Thread GitBox
pan3793 commented on code in PR #38985: URL: https://github.com/apache/spark/pull/38985#discussion_r1043983005 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -241,14 +241,14 @@ class KubernetesConfSuite extends Sp

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043982706 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043982706 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
beliefer commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1043982018 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,57 @@ case class ArrayExcept(left: Expression,

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043915051 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-08 Thread GitBox
beliefer commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1343723426 @sandeep-katta Could you update the PR description and add the info contains syntax, arguments, examples and the mainstream database supports array_append ? Please refer https://github.c

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043915051 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parse

[GitHub] [spark] beliefer commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-08 Thread GitBox
beliefer commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1043977011 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,92 @@ case class ArrayExcept(left: Expression,

[GitHub] [spark] AmplabJenkins commented on pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions

2022-12-08 Thread GitBox
AmplabJenkins commented on PR #38946: URL: https://github.com/apache/spark/pull/38946#issuecomment-1343716380 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38947: [SPARK-41233][SQL] Adds an array_prepend function to catalyst

2022-12-08 Thread GitBox
AmplabJenkins commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1343716355 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wForget commented on pull request #38871: [SPARK-41344][SQL] Make error clearer when table not found in SupportsCatalogOptions catalog

2022-12-08 Thread GitBox
wForget commented on PR #38871: URL: https://github.com/apache/spark/pull/38871#issuecomment-1343712073 Thanks @planga82, cc @HyukjinKwon @dongjoon-hyun Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

  1   2   3   >