[PR] [SPARK-46718][BUILD] Test arrow 15 [spark]

2024-01-19 Thread via GitHub
LuciferYang opened a new pull request, #44797: URL: https://github.com/apache/spark/pull/44797 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

2024-01-19 Thread via GitHub
itholic commented on code in PR #44778: URL: https://github.com/apache/spark/pull/44778#discussion_r1458550594 ## dev/sparktestsupport/modules.py: ## @@ -542,6 +542,10 @@ def __hash__(self): "pyspark.testing.utils", "pyspark.testing.pandasutils", ], +e

Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

2024-01-19 Thread via GitHub
LantaoJin commented on code in PR #44211: URL: https://github.com/apache/spark/pull/44211#discussion_r1458501384 ## connector/kinesis-asl/src/main/java/org/apache/spark/streaming/kinesis/KinesisInitialPositions.java: ## @@ -16,7 +16,8 @@ */ package org.apache.spark.streaming.

Re: [PR] [SPARK-46718][BUILD] Test arrow 15 [spark]

2024-01-19 Thread via GitHub
LuciferYang commented on code in PR #44797: URL: https://github.com/apache/spark/pull/44797#discussion_r1458555146 ## project/SparkBuild.scala: ## @@ -756,10 +757,12 @@ object SparkConnect { // `netty-*.jar` and `unused-1.0.0.jar` from assembly. (assembly / assemblyExc

Re: [PR] [SPARK-46718][BUILD] Test arrow 15 [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44797: URL: https://github.com/apache/spark/pull/44797#discussion_r1458556733 ## project/SparkBuild.scala: ## @@ -756,10 +757,12 @@ object SparkConnect { // `netty-*.jar` and `unused-1.0.0.jar` from assembly. (assembly / assemblyE

Re: [PR] [SPARK-46770][K8S][TESTS] Remove legacy `docker-for-desktop` logic [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44796: URL: https://github.com/apache/spark/pull/44796#issuecomment-1899959573 Thank you! I'll merge this because I verified this manually~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46770][K8S][TESTS] Remove legacy `docker-for-desktop` logic [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun closed pull request #44796: [SPARK-46770][K8S][TESTS] Remove legacy `docker-for-desktop` logic URL: https://github.com/apache/spark/pull/44796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46769][SQL] Fix type inferring for timestamps without time zone in JSON/CSV [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44789: URL: https://github.com/apache/spark/pull/44789#discussion_r1458636982 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -202,11 +202,8 @@ class CSVInferSchema(val options: CSVOptions) extends

Re: [PR] [SPARK-46768][BUILD] Upgrade Guava used by the connect module to 33.0-jre [spark]

2024-01-19 Thread via GitHub
yaooqinn closed pull request #44795: [SPARK-46768][BUILD] Upgrade Guava used by the connect module to 33.0-jre URL: https://github.com/apache/spark/pull/44795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46768][BUILD] Upgrade Guava used by the connect module to 33.0-jre [spark]

2024-01-19 Thread via GitHub
yaooqinn commented on PR #44795: URL: https://github.com/apache/spark/pull/44795#issuecomment-1900042833 Thanks @LuciferYang @dongjoon-hyun, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-46769][SQL] Fix type inferring for timestamps without time zone in JSON/CSV [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44789: URL: https://github.com/apache/spark/pull/44789#discussion_r1458648993 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchemaSuite.scala: ## @@ -267,7 +267,9 @@ class CSVInferSchemaSuite extends SparkFunSuite wit

Re: [PR] [SPARK-46769][SQL] Fix type inferring for timestamps without time zone in JSON/CSV [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44789: URL: https://github.com/apache/spark/pull/44789#discussion_r1458649995 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -202,11 +202,8 @@ class CSVInferSchema(val options: CSVOptions) extends

Re: [PR] [SPARK-46759][SQL][AVRO] Codec xz and zstandard support compression level for avro files [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #44786: URL: https://github.com/apache/spark/pull/44786#discussion_r1458650779 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -110,10 +110,12 @@ private[sql] object AvroUtils extends Logging { case co

Re: [PR] [SPARK-46766][SQL][AVRO] ZSTD Buffer Pool Support For AVRO datasource [spark]

2024-01-19 Thread via GitHub
yaooqinn commented on PR #44792: URL: https://github.com/apache/spark/pull/44792#issuecomment-1900053763 Hi @dongjoon-hyun I've not found any benchmark for this part in arvo repo or website. I think we can have one of our own -- This is an automated message from the Apache Git Serv

Re: [PR] [SPARK-46698][CORE][FOLLOWUP] Replace Timer with single thread scheduled executor [spark]

2024-01-19 Thread via GitHub
beliefer commented on PR #44718: URL: https://github.com/apache/spark/pull/44718#issuecomment-1900054169 cc @dongjoon-hyun @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-46766][SQL][AVRO] ZSTD Buffer Pool Support For AVRO datasource [spark]

2024-01-19 Thread via GitHub
yaooqinn closed pull request #44792: [SPARK-46766][SQL][AVRO] ZSTD Buffer Pool Support For AVRO datasource URL: https://github.com/apache/spark/pull/44792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-46766][SQL][AVRO] ZSTD Buffer Pool Support For AVRO datasource [spark]

2024-01-19 Thread via GitHub
yaooqinn commented on PR #44792: URL: https://github.com/apache/spark/pull/44792#issuecomment-1900061255 https://issues.apache.org/jira/browse/SPARK-46772 is created to do the benchmark thing Thanks @dongjoon-hyun , merged this to master -- This is an automated message from the Apa

[PR] [SPARK-46773][BUILD][CONNECT] Change to use whitelist to `generate assemblyExcludedJars` for the connect server module [spark]

2024-01-19 Thread via GitHub
LuciferYang opened a new pull request, #44798: URL: https://github.com/apache/spark/pull/44798 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-39910][SQL] Delegate path qualification to filesystem during DataSource file path globbing [spark]

2024-01-19 Thread via GitHub
tigrulya-exe commented on PR #43463: URL: https://github.com/apache/spark/pull/43463#issuecomment-1900078190 @cloud-fan Hi! I've rebased on master and fixed conflicts. Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] [SPARK-46766][SQL][AVRO] ZSTD Buffer Pool Support For AVRO datasource [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44792: URL: https://github.com/apache/spark/pull/44792#issuecomment-1900082156 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46590][SQL] Fix coalesce failed with unexpected partition indeces [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44661: URL: https://github.com/apache/spark/pull/44661#discussion_r1458683923 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala: ## @@ -47,9 +47,7 @@ object ShufflePartitionsUtil extends Logging {

Re: [PR] [SPARK-46773][BUILD][CONNECT] Change to use include-list to `generate assemblyExcludedJars` for the connect server module [spark]

2024-01-19 Thread via GitHub
LuciferYang commented on PR #44798: URL: https://github.com/apache/spark/pull/44798#issuecomment-1900095560 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-46768][BUILD] Upgrade Guava used by the connect module to 33.0-jre [spark]

2024-01-19 Thread via GitHub
LuciferYang commented on PR #44795: URL: https://github.com/apache/spark/pull/44795#issuecomment-1900096537 Thanks @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-46718][BUILD] Test arrow 15 [spark]

2024-01-19 Thread via GitHub
LuciferYang commented on code in PR #44797: URL: https://github.com/apache/spark/pull/44797#discussion_r1458698911 ## project/SparkBuild.scala: ## @@ -756,10 +757,12 @@ object SparkConnect { // `netty-*.jar` and `unused-1.0.0.jar` from assembly. (assembly / assemblyExc

[PR] [SPARK-46774][SQL][AVRO] Use mapreduce.output.fileoutputformat.compress instead of deprecated mapred.output.compress in Avro write jobs [spark]

2024-01-19 Thread via GitHub
yaooqinn opened a new pull request, #44799: URL: https://github.com/apache/spark/pull/44799 ### What changes were proposed in this pull request? According to [DeprecatedProperties](https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-common/DeprecatedProper

Re: [PR] [SPARK-46590][SQL] Fix coalesce failed with unexpected partition indeces [spark]

2024-01-19 Thread via GitHub
jackylee-ch commented on code in PR #44661: URL: https://github.com/apache/spark/pull/44661#discussion_r1458785050 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala: ## @@ -47,9 +47,7 @@ object ShufflePartitionsUtil extends Logging {

Re: [PR] [SPARK-46590][SQL] Fix coalesce failed with unexpected partition indeces [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44661: URL: https://github.com/apache/spark/pull/44661#discussion_r1458802466 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala: ## @@ -47,9 +47,7 @@ object ShufflePartitionsUtil extends Logging {

Re: [PR] [SPARK-46774][SQL][AVRO] Use mapreduce.output.fileoutputformat.compress instead of deprecated mapred.output.compress in Avro write jobs [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44799: URL: https://github.com/apache/spark/pull/44799#discussion_r1458816320 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -29,14 +29,14 @@ import org.apache.avro.mapreduce.AvroJob import org.apache.had

[PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan opened a new pull request, #44800: URL: https://github.com/apache/spark/pull/44800 ### What changes were proposed in this pull request? This is a refinement of https://github.com/apache/spark/pull/43243 . This PR enforces one thing: we only infer TIMESTAMP NTZ type u

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on PR #44800: URL: https://github.com/apache/spark/pull/44800#issuecomment-1900225154 cc @gengliangwang @MaxGekk @Hisoka-X -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46774][SQL][AVRO] Use mapreduce.output.fileoutputformat.compress instead of deprecated mapred.output.compress in Avro write jobs [spark]

2024-01-19 Thread via GitHub
yaooqinn commented on code in PR #44799: URL: https://github.com/apache/spark/pull/44799#discussion_r1458830805 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -29,14 +29,14 @@ import org.apache.avro.mapreduce.AvroJob import org.apache.hadoop.c

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
Hisoka-X commented on PR #44800: URL: https://github.com/apache/spark/pull/44800#issuecomment-1900245476 The change seems like can not pass `CSVLegacyTimeParserSuite.SPARK-37326: Timestamp type inference for a column with TIMESTAMP_NTZ`. (Maybe because I used old version code) -- This is

Re: [PR] [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh` [spark]

2024-01-19 Thread via GitHub
zhengruifeng commented on code in PR #44794: URL: https://github.com/apache/spark/pull/44794#discussion_r1458879184 ## python/pyspark/sql/functions/builtin.py: ## @@ -1488,31 +1533,66 @@ def acos(col: "ColumnOrName") -> Column: Parameters -- col : :class:`

Re: [PR] [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh` [spark]

2024-01-19 Thread via GitHub
zhengruifeng commented on code in PR #44794: URL: https://github.com/apache/spark/pull/44794#discussion_r1458881539 ## python/pyspark/sql/functions/builtin.py: ## @@ -1488,31 +1533,66 @@ def acos(col: "ColumnOrName") -> Column: Parameters -- col : :class:`

Re: [PR] [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh` [spark]

2024-01-19 Thread via GitHub
zhengruifeng commented on code in PR #44794: URL: https://github.com/apache/spark/pull/44794#discussion_r1458893184 ## python/pyspark/sql/functions/builtin.py: ## @@ -1488,31 +1533,66 @@ def acos(col: "ColumnOrName") -> Column: Parameters -- col : :class:`

Re: [PR] [SPARK-46698][CORE][FOLLOWUP] Replace Timer with single thread scheduled executor [spark]

2024-01-19 Thread via GitHub
srowen commented on code in PR #44718: URL: https://github.com/apache/spark/pull/44718#discussion_r1458907016 ## core/src/main/scala/org/apache/spark/BarrierTaskContext.scala: ## @@ -71,7 +72,7 @@ class BarrierTaskContext private[spark] ( } } // Log the update o

Re: [PR] [SPARK-46698][CORE][FOLLOWUP] Replace Timer with single thread scheduled executor [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #44718: URL: https://github.com/apache/spark/pull/44718#discussion_r1459024605 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -51,7 +52,8 @@ private[spark] class BarrierCoordinator( // TODO SPARK-25030 Create a Timer

Re: [PR] [SPARK-46698][CORE][FOLLOWUP] Replace Timer with single thread scheduled executor [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #44718: URL: https://github.com/apache/spark/pull/44718#discussion_r1459024605 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -51,7 +52,8 @@ private[spark] class BarrierCoordinator( // TODO SPARK-25030 Create a Timer

[PR] [SPARK-45593][FOLLOWUP] Correct relocation connect guava dependency. [spark]

2024-01-19 Thread via GitHub
Yikf opened a new pull request, #44801: URL: https://github.com/apache/spark/pull/44801 ### What changes were proposed in this pull request? This PR amins to correct relocation connect guava dependency and remove duplicate connect-common from SBT build jars. **Item 1:**

Re: [PR] [SPARK-45593][FOLLOWUP] Correct relocation connect guava dependency. [spark]

2024-01-19 Thread via GitHub
Yikf commented on PR #44801: URL: https://github.com/apache/spark/pull/44801#issuecomment-1900474206 @LuciferYang Please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46698][CORE][FOLLOWUP] Replace Timer with single thread scheduled executor [spark]

2024-01-19 Thread via GitHub
LuciferYang commented on code in PR #44718: URL: https://github.com/apache/spark/pull/44718#discussion_r1459073865 ## launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java: ## @@ -128,7 +125,8 @@ private LauncherServer() throws IOException { this.threadIds

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-19 Thread via GitHub
nickstanishadb commented on PR #44678: URL: https://github.com/apache/spark/pull/44678#issuecomment-1900503295 @dtenedor I just had another thought about making something like this more usable for python users. It would be awesome to also provide a utility for python users to inspect their

Re: [PR] [SPARK-46764][DOCS] Reorganize script to build API docs [spark]

2024-01-19 Thread via GitHub
nchammas commented on code in PR #44791: URL: https://github.com/apache/spark/pull/44791#discussion_r1459233153 ## docs/_plugins/build_api_docs.rb: ## @@ -0,0 +1,205 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. S

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1459232548 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -199,14 +201,13 @@ class CSVInferSchema(val options: CSVOptions) extends

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1459234868 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -1105,10 +1105,12 @@ abstract class CSVSuite test("SPARK-37326: T

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1459237229 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -2874,13 +2885,12 @@ abstract class CSVSuite test("SPARK-40474: I

Re: [PR] [SPARK-46764][DOCS] Reorganize script to build API docs [spark]

2024-01-19 Thread via GitHub
nchammas commented on code in PR #44791: URL: https://github.com/apache/spark/pull/44791#discussion_r1459289329 ## docs/_plugins/build_api_docs.rb: ## @@ -0,0 +1,205 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. S

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-19 Thread via GitHub
dtenedor commented on PR #44678: URL: https://github.com/apache/spark/pull/44678#issuecomment-1900726776 @nickstanishadb how would this be different from calling Python's `resource.getrusage` within the `analyze` method, and then adding the resulting memory number into a subclass of `Analyz

[PR] [SPARK-46775][SS] Fix formatting of Kinesis docs [spark]

2024-01-19 Thread via GitHub
nchammas opened a new pull request, #44802: URL: https://github.com/apache/spark/pull/44802 ### What changes were proposed in this pull request? - Convert the mixed indentation styles to spaces only. - Add syntax highlighting to the code blocks. - Fix a couple of broken links to

Re: [PR] [SPARK-46775][SS] Fix formatting of Kinesis docs [spark]

2024-01-19 Thread via GitHub
nchammas commented on code in PR #44802: URL: https://github.com/apache/spark/pull/44802#discussion_r1459338158 ## docs/streaming-kinesis-integration.md: ## @@ -32,201 +32,216 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m 1. **Linking:**

[PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

2024-01-19 Thread via GitHub
johanl-db opened a new pull request, #44803: URL: https://github.com/apache/spark/pull/44803 ### What changes were proposed in this pull request? This is a follow-up from https://github.com/apache/spark/pull/44368 and https://github.com/apache/spark/pull/44513, implementing an additional

[PR] [MINOR][DOCS] Remove mention of Jenkins from "Building Spark" docs [spark]

2024-01-19 Thread via GitHub
nchammas opened a new pull request, #44804: URL: https://github.com/apache/spark/pull/44804 ### What changes were proposed in this pull request? - Remove mention of Jenkins from the "Building Spark" docs as we do not use Jenkins anymore. - Add syntax highlighting to some of the cod

Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2024-01-19 Thread via GitHub
adriennn commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-1900814892 @shujingyang-db shared to you by email. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-46773][BUILD][CONNECT] Change to use include-list to `generate assemblyExcludedJars` for the connect server module [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun closed pull request #44798: [SPARK-46773][BUILD][CONNECT] Change to use include-list to `generate assemblyExcludedJars` for the connect server module URL: https://github.com/apache/spark/pull/44798 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-46773][BUILD][CONNECT] Change to use include-list to `generate assemblyExcludedJars` for the connect server module [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44798: URL: https://github.com/apache/spark/pull/44798#issuecomment-1900891360 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-46774][SQL][AVRO] Use mapreduce.output.fileoutputformat.compress instead of deprecated mapred.output.compress in Avro write jobs [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun closed pull request #44799: [SPARK-46774][SQL][AVRO] Use mapreduce.output.fileoutputformat.compress instead of deprecated mapred.output.compress in Avro write jobs URL: https://github.com/apache/spark/pull/44799 -- This is an automated message from the Apache Git Service. To re

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
gengliangwang commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1459496728 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +160,21 @@ class JsonInferSchema(options: JSONOptions) exte

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
gengliangwang commented on PR #44800: URL: https://github.com/apache/spark/pull/44800#issuecomment-1900912818 LGTM except for one comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [MINOR][DOCS] Remove mention of Jenkins from "Building Spark" docs [spark]

2024-01-19 Thread via GitHub
bjornjorgensen commented on PR #44804: URL: https://github.com/apache/spark/pull/44804#issuecomment-1900987231 "Remove mention of Jenkins from the "Building Spark" docs as we do not use Jenkins anymore." have a look at https://github.com/apache/spark/pull/40178 -- This is an automat

Re: [PR] [SPARK-46759][SQL][AVRO] Codec xz and zstandard support compression level for avro files [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44786: URL: https://github.com/apache/spark/pull/44786#discussion_r1459618369 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -110,10 +110,12 @@ private[sql] object AvroUtils extends Logging { ca

Re: [PR] [MINOR][DOCS] Remove mention of Jenkins from "Building Spark" docs [spark]

2024-01-19 Thread via GitHub
nchammas commented on PR #44804: URL: https://github.com/apache/spark/pull/44804#issuecomment-1901035166 Oh, interesting. Thanks for the reference. As we have both been confused by this same issue, perhaps I should update that text to reference the Jenkins infrastructure that Scaleway is ma

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-01-19 Thread via GitHub
krymitch commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-1901136777 @pkotikalapudi please share new voting thread here or in old thread. A few of us over at Adobe would like to add our vote, since this work will support a few projects we are currently us

[PR] [SPARK-46780][K8S][TESTS] Support skipping R image build step [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #44805: URL: https://github.com/apache/spark/pull/44805 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[PR] [SPARK-46779][SQL] `InMemoryRelation` instances of the same cached plan should be semantically equivalent [spark]

2024-01-19 Thread via GitHub
bersprockets opened a new pull request, #44806: URL: https://github.com/apache/spark/pull/44806 ### What changes were proposed in this pull request? When canonicalizing `output` in `InMemoryRelation`, use `output` itself as the schema for determining the ordinals, rather than `cachedP

Re: [PR] [SPARK-46780][K8S][TESTS] Support skipping R image build step in SBT [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun closed pull request #44805: [SPARK-46780][K8S][TESTS] Support skipping R image build step in SBT URL: https://github.com/apache/spark/pull/44805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-46780][K8S][TESTS] Support skipping R image build step [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #44807: URL: https://github.com/apache/spark/pull/44807 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44807: URL: https://github.com/apache/spark/pull/44807#issuecomment-1901182370 Could you review this K8s IT test PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
viirya commented on code in PR #44807: URL: https://github.com/apache/spark/pull/44807#discussion_r1459938694 ## project/SparkBuild.scala: ## @@ -995,8 +995,12 @@ object KubernetesIntegrationTests { s"$sparkHome/resource-managers/kubernetes/docker/src/main/dockerfi

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
viirya commented on code in PR #44807: URL: https://github.com/apache/spark/pull/44807#discussion_r1459938694 ## project/SparkBuild.scala: ## @@ -995,8 +995,12 @@ object KubernetesIntegrationTests { s"$sparkHome/resource-managers/kubernetes/docker/src/main/dockerfi

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44807: URL: https://github.com/apache/spark/pull/44807#discussion_r1459944154 ## project/SparkBuild.scala: ## @@ -995,8 +995,12 @@ object KubernetesIntegrationTests { s"$sparkHome/resource-managers/kubernetes/docker/src/main/d

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44807: URL: https://github.com/apache/spark/pull/44807#discussion_r1459944518 ## project/SparkBuild.scala: ## @@ -995,8 +995,12 @@ object KubernetesIntegrationTests { s"$sparkHome/resource-managers/kubernetes/docker/src/main/d

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on code in PR #44807: URL: https://github.com/apache/spark/pull/44807#discussion_r1459946330 ## project/SparkBuild.scala: ## @@ -995,8 +995,12 @@ object KubernetesIntegrationTests { s"$sparkHome/resource-managers/kubernetes/docker/src/main/d

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44807: URL: https://github.com/apache/spark/pull/44807#issuecomment-1901264618 Thank you, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun closed pull request #44807: [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed URL: https://github.com/apache/spark/pull/44807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-46780][K8S][TESTS] Improve SBT K8s IT to skip R image build step if not needed [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun commented on PR #44807: URL: https://github.com/apache/spark/pull/44807#issuecomment-1901265445 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-46731][SS] Manage state store provider instance by state data source - reader [spark]

2024-01-19 Thread via GitHub
HeartSaVioR commented on PR #44751: URL: https://github.com/apache/spark/pull/44751#issuecomment-1901275017 Thanks for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46731][SS] Manage state store provider instance by state data source - reader [spark]

2024-01-19 Thread via GitHub
HeartSaVioR closed pull request #44751: [SPARK-46731][SS] Manage state store provider instance by state data source - reader URL: https://github.com/apache/spark/pull/44751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [WIP][SPARK-46467][PS][TESTS] Improve and test exceptions of TimedeltaIndex [spark]

2024-01-19 Thread via GitHub
xinrong-meng closed pull request #44430: [WIP][SPARK-46467][PS][TESTS] Improve and test exceptions of TimedeltaIndex URL: https://github.com/apache/spark/pull/44430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [WIP][SPARK-46467][PS][TESTS] Improve and test exceptions of TimedeltaIndex [spark]

2024-01-19 Thread via GitHub
xinrong-meng commented on PR #44430: URL: https://github.com/apache/spark/pull/44430#issuecomment-1901324816 I'll close the PR unless we want to migrate Pandas on Spark to PySpark error framework. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] SPARK-45200: All log4j2 configuration file override [spark]

2024-01-19 Thread via GitHub
github-actions[bot] closed pull request #43294: SPARK-45200: All log4j2 configuration file override URL: https://github.com/apache/spark/pull/43294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2024-01-19 Thread via GitHub
github-actions[bot] closed pull request #43280: [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId URL: https://github.com/apache/spark/pull/43280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-44033][PYTHON] Added support for binary ops for list like objects [spark]

2024-01-19 Thread via GitHub
github-actions[bot] closed pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects URL: https://github.com/apache/spark/pull/42962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor [spark]

2024-01-19 Thread via GitHub
github-actions[bot] closed pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor URL: https://github.com/apache/spark/pull/38852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1460131124 ## common/utils/src/main/resources/error/README.md: ## @@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from: |HZ320|HZ |RDA-specific condition

[PR] [SPARK-46783][K8S][TESTS] Use `built-in` storage classes in PVTestsSuite [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #44809: URL: https://github.com/apache/spark/pull/44809 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[PR] [SPARK-46784][K8S][TESTS] Create and use a K8s test tag for `PersistentVolume` [spark]

2024-01-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #44810: URL: https://github.com/apache/spark/pull/44810 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1460150835 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +161,28 @@ class JsonInferSchema(options: JSONOptions) extends

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1460152048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +161,28 @@ class JsonInferSchema(options: JSONOptions) extends

Re: [PR] [SPARK-45827] Disallow partitioning on Variant column [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on PR #44742: URL: https://github.com/apache/spark/pull/44742#issuecomment-1901658912 the failure is unrelated, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45827] Disallow partitioning on Variant column [spark]

2024-01-19 Thread via GitHub
cloud-fan closed pull request #44742: [SPARK-45827] Disallow partitioning on Variant column URL: https://github.com/apache/spark/pull/44742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
gengliangwang commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1460158163 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +161,28 @@ class JsonInferSchema(options: JSONOptions) exte

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1460158951 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +161,28 @@ class JsonInferSchema(options: JSONOptions) extends

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

2024-01-19 Thread via GitHub
cloud-fan commented on code in PR #44800: URL: https://github.com/apache/spark/pull/44800#discussion_r1460159595 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala: ## @@ -159,16 +161,28 @@ class JsonInferSchema(options: JSONOptions) extends

Re: [PR] [SPARK-46512][CORE] Optimize shuffle reading when both sort and combine are used. [spark]

2024-01-19 Thread via GitHub
mridulm commented on code in PR #44512: URL: https://github.com/apache/spark/pull/44512#discussion_r1460207780 ## core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala: ## @@ -111,31 +111,50 @@ private[spark] class BlockStoreShuffleReader[K, C]( // An i

Re: [PR] [SPARK-46512][CORE] Optimize shuffle reading when both sort and combine are used. [spark]

2024-01-19 Thread via GitHub
mridulm commented on code in PR #44512: URL: https://github.com/apache/spark/pull/44512#discussion_r1460207780 ## core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala: ## @@ -111,31 +111,50 @@ private[spark] class BlockStoreShuffleReader[K, C]( // An i

Re: [PR] [SPARK-46512][CORE] Optimize shuffle reading when both sort and combine are used. [spark]

2024-01-19 Thread via GitHub
mridulm commented on code in PR #44512: URL: https://github.com/apache/spark/pull/44512#discussion_r1460207780 ## core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala: ## @@ -111,31 +111,50 @@ private[spark] class BlockStoreShuffleReader[K, C]( // An i

Re: [PR] [SPARK-46759][SQL][AVRO] Codec xz and zstandard support compression level for avro files [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #44786: URL: https://github.com/apache/spark/pull/44786#discussion_r1460278740 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -110,10 +110,12 @@ private[sql] object AvroUtils extends Logging { case co

Re: [PR] [SPARK-46759][SQL][AVRO] Codec xz and zstandard support compression level for avro files [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #44786: URL: https://github.com/apache/spark/pull/44786#discussion_r1460278740 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -110,10 +110,12 @@ private[sql] object AvroUtils extends Logging { case co

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2024-01-19 Thread via GitHub
beliefer commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1460279707 ## common/utils/src/main/resources/error/README.md: ## @@ -1309,6 +1309,7 @@ The following SQLSTATEs are collated from: |HZ320|HZ |RDA-specific condition