Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
beliefer commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754437003 > Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module? Yes. I will do it. -- This is an automated message from the

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351528465 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub
LuciferYang commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754435400 Thank you for your confirmation @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754434466 Yes, `re-trigger` will help. I saw the same situation before after upgrading the recent Oracle image, but didn't check the root cause yet, @LuciferYang . -- This is an automated

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub
LuciferYang commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754433043 > @LuciferYang the docker integration tests are consistently failing with > > ``` > [info] *** 2 SUITES ABORTED *** > [error] Error during tests: > [error]

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub
viirya commented on code in PR #43302: URL: https://github.com/apache/spark/pull/43302#discussion_r1351517603 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -963,4 +964,18 @@ class MathExpressionsSuite extends

Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub
itholic commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351516059 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver {

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754419226 Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module? -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub
HyukjinKwon closed pull request #43257: [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) URL: https://github.com/apache/spark/pull/43257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754413902 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub
MaxGekk commented on code in PR #43302: URL: https://github.com/apache/spark/pull/43302#discussion_r1351508855 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -963,4 +964,18 @@ class MathExpressionsSuite extends

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754410308 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { *

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { *

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { *

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub
viirya commented on PR #43302: URL: https://github.com/apache/spark/pull/43302#issuecomment-1754393924 Thanks @dongjoon-hyun . I've fixed `ExpressionTypeCheckingSuite`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351472659 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754380347 Thank you for review and approval, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754367501 Thank you again. The arguments are switched and the default value is given, @beliefer . -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351440730 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351432373 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the

Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]

2023-10-09 Thread via GitHub
yorksity commented on PR #43270: URL: https://github.com/apache/spark/pull/43270#issuecomment-1754356006 ![image](https://github.com/apache/spark/assets/38931534/a28c65ff-fc71-4eff-948f-977efb505c4e) All tests passed. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754340435 Thank you for review. The document is updated too, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351374027 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351373013 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351371713 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369480 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369382 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369271 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369115 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754322080 Just a question, is this required for SPARK-44114? Or, simply want to remove the magic strings? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754308194 Thank you for review, @beliefer ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub
hasnain-db commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754299341 @LuciferYang the docker integration tests are consistently failing with ``` [info] *** 2 SUITES ABORTED *** [error] Error during tests: [error]

Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector elements [spark]

2023-10-09 Thread via GitHub
srowen commented on PR #43288: URL: https://github.com/apache/spark/pull/43288#issuecomment-1754292488 Is that really an error? It's the only way to represent a missing value -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action. [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on PR #39976: URL: https://github.com/apache/spark/pull/39976#issuecomment-1754286405 Thanks. Made a followup: https://github.com/apache/spark/pull/43304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils [spark]

2023-10-09 Thread via GitHub
HyukjinKwon opened a new pull request, #43304: URL: https://github.com/apache/spark/pull/43304 ### What changes were proposed in this pull request? This PR is kind of a followup for https://github.com/apache/spark/pull/39976 that addresses

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351289594 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver {

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754272348 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45417][PYTHON] Make InheritableThread inherit the active session [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on PR #43231: URL: https://github.com/apache/spark/pull/43231#issuecomment-1754268341 Mind taking a look at the linter failure? https://github.com/clee704/spark/actions/runs/6462049944/job/17543011013 -- This is an automated message from the Apache Git Service. To

[PR] [SPARK-45474][CORE] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun opened a new pull request, #43303: URL: https://github.com/apache/spark/pull/43303 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754265070 Yeah, I think we should add this to a linter .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector values [spark]

2023-10-09 Thread via GitHub
zhengruifeng commented on PR #43288: URL: https://github.com/apache/spark/pull/43288#issuecomment-1754259142 > Does this change behavior or just refactor code? I'm actually surprised that NaN is treated as an error but looks like that is existing behavior? It is a behavior change.

Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]

2023-10-09 Thread via GitHub
HeartSaVioR closed pull request #43299: [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence URL: https://github.com/apache/spark/pull/43299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]

2023-10-09 Thread via GitHub
HeartSaVioR commented on PR #43299: URL: https://github.com/apache/spark/pull/43299#issuecomment-1754222412 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub
HeartSaVioR commented on PR #43174: URL: https://github.com/apache/spark/pull/43174#issuecomment-1754205290 Just ported the fix back to 3.5 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub
HeartSaVioR closed pull request #43174: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance URL: https://github.com/apache/spark/pull/43174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1351195382 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2580,7 +2580,7 @@ private[spark] class DAGScheduler( // if the cluster manager

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub
HeartSaVioR commented on PR #43174: URL: https://github.com/apache/spark/pull/43174#issuecomment-1754204031 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43039: URL: https://github.com/apache/spark/pull/43039#discussion_r1351192772 ## python/pyspark/sql/dataframe.py: ## @@ -2646,67 +2647,147 @@ def join( Examples -The following performs a full outer join

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351177061 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } +

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351167115 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub
panbingkun commented on PR #43286: URL: https://github.com/apache/spark/pull/43286#issuecomment-1754193939 > Thank you @panbingkun! Can we regenerate the docs and add the canonical links to the released docs HTML? Regarding the published document HTML, the following PR is in

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
beliefer commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754189027 > Ur, very sorry, but I'd not do this ORC change when we do the samething still for Parquet , @beliefer . This only increases the compile depdency for

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub
Hisoka-X commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351138719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } +

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43296: URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala: ## @@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #43296: URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala: ## @@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub
beliefer commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351135259 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } +

Re: [PR] [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat [spark]

2023-10-09 Thread via GitHub
Hisoka-X commented on code in PR #43243: URL: https://github.com/apache/spark/pull/43243#discussion_r1351132582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub
Hisoka-X commented on PR #42762: URL: https://github.com/apache/spark/pull/42762#issuecomment-1754157169 cc @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub
itholic commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351085843 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver {

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub
holdenk commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754124296 Also could we add automated checking for the import order change so that we don't have to do this again? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub
holdenk commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351071287 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub
holdenk commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351070023 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]

2023-10-09 Thread via GitHub
holdenk commented on code in PR #43039: URL: https://github.com/apache/spark/pull/43039#discussion_r1351066467 ## python/pyspark/sql/dataframe.py: ## @@ -2646,67 +2647,147 @@ def join( Examples -The following performs a full outer join

Re: [PR] [SPARK-44837][SQL] Improve ALTER TABLE ALTER PARTITION column error message [spark]

2023-10-09 Thread via GitHub
holdenk commented on code in PR #42524: URL: https://github.com/apache/spark/pull/42524#discussion_r1351065858 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -97,6 +97,12 @@ "The method can not be called on streaming Dataset/DataFrame." ] },

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub
holdenk commented on PR #43302: URL: https://github.com/apache/spark/pull/43302#issuecomment-1754111612 LGTM pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor [spark]

2023-10-09 Thread via GitHub
holdenk commented on PR #38852: URL: https://github.com/apache/spark/pull/38852#issuecomment-1754108618 Maybe the `NettyBlockTransferServiceSuite` would be enough so we can assert that we are keeping track of active streams and also once done that the count is zero (so we know this is not

Re: [PR] [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey [spark]

2023-10-09 Thread via GitHub
github-actions[bot] closed pull request #40334: [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey URL: https://github.com/apache/spark/pull/40334 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]

2023-10-09 Thread via GitHub
HyukjinKwon closed pull request #42859: [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client URL: https://github.com/apache/spark/pull/42859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]

2023-10-09 Thread via GitHub
HyukjinKwon commented on PR #42859: URL: https://github.com/apache/spark/pull/42859#issuecomment-1754064070 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350999129 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction {

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350991334 ## python/pyspark/worker.py: ## @@ -693,6 +698,21 @@ def read_udtf(pickleSer, infile, eval_type): f"The return type of a UDTF must be a struct type, but

[PR] Minor: Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub
viirya opened a new pull request, #43302: URL: https://github.com/apache/spark/pull/43302 ### What changes were proposed in this pull request? This minor patch fixes incorrect error message of `RoundBase`. ### Why are the changes needed? Fix incorrect

Re: [PR] [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub
allisonwang-db commented on PR #42425: URL: https://github.com/apache/spark/pull/42425#issuecomment-1753943908 @panbingkun thank you so much for working on this. It will be super helpful in improving the ranking for the PySpark documentations. -- This is an automated message from the

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub
allisonwang-db commented on PR #43286: URL: https://github.com/apache/spark/pull/43286#issuecomment-1753939826 Thank you @panbingkun! Can we regenerate the docs and add the canonical links to the released docs HTML? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub
mridulm commented on PR #43280: URL: https://github.com/apache/spark/pull/43280#issuecomment-1753930980 Btw, it is not very clear to me how this functionality is going to be leveraged - unlers you are relying on resource profiles to tag which executors should leverage reliable shuffle and

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350811414 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -241,12 +248,17 @@ case class UnresolvedPolymorphicPythonUDTF( *

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub
mridulm commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1350810420 ## core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java: ## @@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) {}

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub
mridulm commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1350809807 ## core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java: ## @@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) {}

[PR] [SPARK-45221][PYTHON][DOCS] Refine docstring of DataFrameReader.parquet [spark]

2023-10-09 Thread via GitHub
allisonwang-db opened a new pull request, #43301: URL: https://github.com/apache/spark/pull/43301 ### What changes were proposed in this pull request? This PR refines the docstring of DataFrameReader.parquet by adding more examples. ### Why are the changes needed?

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub
mridulm commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1350787954 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -661,7 +661,7 @@ class SparkContext(config: SparkConf) extends Logging { Some(new

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350775454 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction {

Re: [PR] [SPARK-45456][BUILD] Upgrade maven to 3.9.5 [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun closed pull request #43267: [SPARK-45456][BUILD] Upgrade maven to 3.9.5 URL: https://github.com/apache/spark/pull/43267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun closed pull request #43286: [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page URL: https://github.com/apache/spark/pull/43286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun closed pull request #43285: [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page URL: https://github.com/apache/spark/pull/43285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]

2023-10-09 Thread via GitHub
allisonwang-db commented on PR #43039: URL: https://github.com/apache/spark/pull/43039#issuecomment-1753756796 cc @cloud-fan @HyukjinKwon the test failure seems unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350762771 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2309,6 +2309,55 @@ def terminate(self): + [Row(partition_col=42, count=3, total=3, last=None)],

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1348084412 ## python/pyspark/sql/udtf.py: ## @@ -107,12 +107,20 @@ class AnalyzeResult: If non-empty, this is a sequence of columns that the UDTF is specifying for

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on PR #43204: URL: https://github.com/apache/spark/pull/43204#issuecomment-1753741579 Hi @allisonwang-db @ueshin thanks for your reviews, these were good comments, please look again! I think the new API is better now. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350763173 ## python/pyspark/worker.py: ## @@ -786,6 +787,24 @@ def _remove_partition_by_exprs(self, arg: Any) -> Any: else: return arg +#

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350763041 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -167,22 +169,26 @@ abstract class UnevaluableGenerator extends

Re: [PR] [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on PR #43283: URL: https://github.com/apache/spark/pull/43283#issuecomment-1753566223 Merged to master for Apache Spark 4.0.0. Thank you, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun closed pull request #43283: [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic URL: https://github.com/apache/spark/pull/43283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic [spark]

2023-10-09 Thread via GitHub
dongjoon-hyun commented on code in PR #43283: URL: https://github.com/apache/spark/pull/43283#discussion_r1350719313 ## core/src/test/scala/org/apache/spark/util/HadoopFSUtilsSuite.scala: ## @@ -30,4 +30,36 @@ class HadoopFSUtilsSuite extends SparkFunSuite {

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub
MaxGekk commented on code in PR #43296: URL: https://github.com/apache/spark/pull/43296#discussion_r1350717818 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala: ## @@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends

Re: [PR] [SPARK-45383][SQL] Fix error message for time travel with non-existing table [spark]

2023-10-09 Thread via GitHub
MaxGekk closed pull request #43298: [SPARK-45383][SQL] Fix error message for time travel with non-existing table URL: https://github.com/apache/spark/pull/43298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45383][SQL] Fix error message for time travel with non-existing table [spark]

2023-10-09 Thread via GitHub
MaxGekk commented on PR #43298: URL: https://github.com/apache/spark/pull/43298#issuecomment-1753544355 +1, LGTM. Merging to master/3.5. Thank you, @cloud-fan and @viirya for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-44919][AVRO] Avro connector: convert a union of a single primitive type to a StructType [spark]

2023-10-09 Thread via GitHub
tianhanhu-db commented on code in PR #42618: URL: https://github.com/apache/spark/pull/42618#discussion_r1350692984 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala: ## @@ -142,18 +143,30 @@ object SchemaConverters { if

[PR] Test XMLSuite on the edge parser [spark]

2023-10-09 Thread via GitHub
shujingyang-db opened a new pull request, #43300: URL: https://github.com/apache/spark/pull/43300 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]

2023-10-09 Thread via GitHub
siying opened a new pull request, #43299: URL: https://github.com/apache/spark/pull/43299 ### What changes were proposed in this pull request? In RocksDBFileManager, we add a variable to indicate that root path is already checked and created if not existing, so that we don't need to

  1   2   3   >