Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1982836579 @dongjoon-hyun > It sounds like you have other systems to read Spark's data. Correct. The issue was that from 3.2 to 3.3 there was a behavior change in the base64 encodings used i

[PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn opened a new pull request, #45418: URL: https://github.com/apache/spark/pull/45418 ### What changes were proposed in this pull request? For Postgres, TimestampNTZ works well for plain TimestampNTZ types but not for nested ones, typically for now: array. This PR

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1515746618 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all the

Re: [PR] [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45417: [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT URL: https://github.com/apache/spark/pull/45417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45417: URL: https://github.com/apache/spark/pull/45417#issuecomment-1983000945 Merged to master. Thank you @wForget and @HyukjinKwon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1515785048 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45350: URL: https://github.com/apache/spark/pull/45350#issuecomment-1983026959 thanks for the review, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45350: [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator URL: https://github.com/apache/spark/pull/45350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45401: URL: https://github.com/apache/spark/pull/45401#discussion_r1515795172 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala: ## @@ -129,16 +129,6 @@ class StringUtilsSuite extends SparkFunSuite with S

[PR] [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too [spark]

2024-03-07 Thread via GitHub
AngersZh opened a new pull request, #33934: URL: https://github.com/apache/spark/pull/33934 ### What changes were proposed in this pull request? In current pyspark, stderr and stdout are print together, if python script exit, PythonRunner will only throw a `SparkUserAppsException` wit

Re: [PR] [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #33934: URL: https://github.com/apache/spark/pull/33934#discussion_r1515813583 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3281,6 +3282,80 @@ private[spark] class RedirectThread( } } +private[spark] class SparkProcess(

Re: [PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45403: [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite URL: https://github.com/apache/spark/pull/45403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45403: URL: https://github.com/apache/spark/pull/45403#issuecomment-1983075720 Merged to master. Thank you, @panbingkun & @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45418: URL: https://github.com/apache/spark/pull/45418#issuecomment-1983087260 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
wForget closed pull request #45397: [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule URL: https://github.com/apache/spark/pull/45397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
wForget commented on PR #45397: URL: https://github.com/apache/spark/pull/45397#issuecomment-1983129621 Close with comment: https://github.com/apache/spark/pull/45397#discussion_r1515557219 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
kimborowicz opened a new pull request, #45419: URL: https://github.com/apache/spark/pull/45419 ### What changes were proposed in this pull request? Fixed an error in the docstring documentation for the parameter `withReplacement` of `takeSample` method in `pyspark.RDD`, should be

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-07 Thread via GitHub
jwang0306 commented on PR #45348: URL: https://github.com/apache/spark/pull/45348#issuecomment-1983243153 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-46992]Fix cache consistence [spark]

2024-03-07 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1983260338 All children have to be considered for changes of their persistence state. Currently it only checks the fist found child. For clarity there is a test which fails: https://github.com/do

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45401: [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits URL: https://github.com/apache/spark/pull/45401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1983276998 Merged to master. Thank you @cloud-fan @dongjoon-hyun @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515983493 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -218,6 +218,6 @@ class DataTypeAstBuilder extends SqlBaseParserBaseVisi

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515984970 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char":

Re: [PR] [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45399: [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` URL: https://github.com/apache/spark/pull/45399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45399: URL: https://github.com/apache/spark/pull/45399#issuecomment-1983290627 Merged to master. Thank you @panbingkun @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1983294203 Merged to master. Thank you @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45365: [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 URL: https://github.com/apache/spark/pull/45365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1983298155 Thanks @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45419: URL: https://github.com/apache/spark/pull/45419#issuecomment-1983320337 Merged to master Thank you @kimborowicz @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45419: [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method URL: https://github.com/apache/spark/pull/45419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9][WIP] [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45407: URL: https://github.com/apache/spark/pull/45407#discussion_r1515965942 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala: ## @@ -455,19 +455,6 @@ class DDLParserSuite extends AnalysisTest with SharedSp

Re: [PR] [SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9][WIP] [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on PR #45407: URL: https://github.com/apache/spark/pull/45407#issuecomment-1983367613 @stefanbuk-db If you are still working on the PR, please, move the tag `[WIP]` at the beginning of PR's title (this is a convention) -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516062859 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationN

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516070311 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,7 +117,7 @@ object DataType { private val FIXED_DECIMAL = """decimal\(\s*(\d+)\s*,\s*

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1516087243 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

[PR] [SPARK-46761][SQL] Quoted strings in a JSON path should support ? characters [spark]

2024-03-07 Thread via GitHub
planga82 opened a new pull request, #45420: URL: https://github.com/apache/spark/pull/45420 ### What changes were proposed in this pull request? If there is a JSON with a ? character in the key like ``` {"?":"QUESTION"} ``` This PR allow to add this character in

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1516106051 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all t

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cashmand commented on code in PR #45409: URL: https://github.com/apache/spark/pull/45409#discussion_r1516216789 ## sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala: ## @@ -175,6 +175,25 @@ trait CreatableRelationProvider { mode: SaveMode, param

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516357243 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE colla

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45382: URL: https://github.com/apache/spark/pull/45382#issuecomment-1983758947 The GA jobs all passed: https://github.com/uros-db/spark/actions/runs/8186876833/job/22395549669 merging to master, thanks! -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45382: [SPARK-47248][SQL][COLLATION] Improved string function support: contains URL: https://github.com/apache/spark/pull/45382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516373801 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationN

Re: [PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-07 Thread via GitHub
johnnywalker commented on code in PR #45410: URL: https://github.com/apache/spark/pull/45410#discussion_r1516375276 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -153,12 +153,12 @@ object JDBCRDD extends Logging { */ class JDB

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
tgravescs commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1516375405 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributo

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationN

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516379232 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import com.esotericsoftw

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516380909 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { } pu

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516381847 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { } pu

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516389365 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { } pu

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516391257 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import com.esotericsoftw

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE colla

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1516398411 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE colla

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516415830 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationN

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE collationN

[PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db opened a new pull request, #45422: URL: https://github.com/apache/spark/pull/45422 ### What changes were proposed in this pull request? ### Why are the changes needed? Currently, all `StringType` arguments passed to built-in string functions in Spark SQL get treated

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45409: URL: https://github.com/apache/spark/pull/45409#issuecomment-1983973892 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45409: [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider URL: https://github.com/apache/spark/pull/45409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
jchen5 commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516503722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import org.apache.spark.sql.catalyst.trees.Tree

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1516510742 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (

[PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db opened a new pull request, #45423: URL: https://github.com/apache/spark/pull/45423 ### What changes were proposed in this pull request? In the PR, I propose to assign the proper names to the legacy error classes _LEGACY_ERROR_TEMP_324[7-9], and modify tests in testing suites to

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
sahnib commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516471532 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class DataSour

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db closed pull request #45423: Miland db/miland legacy error class URL: https://github.com/apache/spark/pull/45423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516535896 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,7 +117,7 @@ object DataType { private val FIXED_DECIMAL = """decimal\(\s*(\d+)\s

[PR] [SPARK-47319][SQL] Fix missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth opened a new pull request, #45424: URL: https://github.com/apache/spark/pull/45424 ### What changes were proposed in this pull request? This PR speeds up `QueryPlan.missingInput()` calculation. ### Why are the changes needed? This seems to be the root cause of `Ded

Re: [PR] [SPARK-37932][SQL]Wait to resolve missing attributes before applying DeduplicateRelations [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #35684: URL: https://github.com/apache/spark/pull/35684#issuecomment-1984107426 @martinf-moodys, [SPARK-47319](https://issues.apache.org/jira/browse/SPARK-47319) / https://github.com/apache/spark/pull/45424 might help, especially if you have many `Union` nodes

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516609093 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984150861 LGTM I talked to @peter-toth offline and the improvement comes from not calculating the `inputSet` at all when references is empty -- This is an automated message from the A

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984153122 @cloud-fan can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516651884 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val baseSe

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516669562 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val baseSe

[PR] [SPARK-47318][Security] Adds HKDF round to AuthEngine key derivation [spark]

2024-03-07 Thread via GitHub
sweisdb opened a new pull request, #45425: URL: https://github.com/apache/spark/pull/45425 ### What changes were proposed in this pull request? This change adds an additional pass through a key derivation function (KDF) to the key exchange protocol in `AuthEngine`. Currently, it uses

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
ueshin commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516750441 ## python/pyspark/sql/profiler.py: ## @@ -224,6 +224,54 @@ def dump(id: int) -> None: for id in sorted(code_map.keys()): dump(id) +de

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def test_clear_memory_type

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def test_clear_memory_type

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516770647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import org.apache.spark.sql.catalyst.trees.T

Re: [PR] [SPARK-47311][SQL][PYTHON] Suppress Python exceptions where PySpark is not in the Python path [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45414: URL: https://github.com/apache/spark/pull/45414#issuecomment-1984375769 Looks nice, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984433848 Thank you for the confirmation, @ted-jenks . Well, in this case, it's too late to change the behavior again. Apache Spark 3.3 is already the EOL status since last year and I don't t

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng closed pull request #45378: [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling URL: https://github.com/apache/spark/pull/45378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45378: URL: https://github.com/apache/spark/pull/45378#issuecomment-1984523232 Merged to master, thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
shujingyang-db opened a new pull request, #45426: URL: https://github.com/apache/spark/pull/45426 ### What changes were proposed in this pull request? This PR fixes XML schema inference issues: 1. when there's an empty tag 2. when merging schema for NullType

Re: [PR] [SPARK-46071][SQL] Optimize CaseWhen toJSON content [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43979: URL: https://github.com/apache/spark/pull/43979#issuecomment-1984826451 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46034][CORE] SparkContext add file should also copy file to local root path [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43936: URL: https://github.com/apache/spark/pull/43936#issuecomment-1984826472 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2024-03-07 Thread via GitHub
github-actions[bot] closed pull request #42398: [SPARK-42746][SQL] Add the LISTAGG() aggregate function URL: https://github.com/apache/spark/pull/42398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984834978 Mind filing a JIRA and linking it to the PR title please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984835207 See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517022572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45269: URL: https://github.com/apache/spark/pull/45269#issuecomment-1984848824 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45269: [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers URL: https://github.com/apache/spark/pull/45269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45426: URL: https://github.com/apache/spark/pull/45426#issuecomment-1984850009 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45426: [SPARK-47309][SQL][XML] Fix schema inference issues in XML URL: https://github.com/apache/spark/pull/45426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46992]Fix cache consistence [spark]

2024-03-07 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1984850287 > All children have to be considered for changes of their persistence state. Currently it only checks the fist found child. For clarity there is a test which fails: [doki23#1](https://gith

[PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45427: URL: https://github.com/apache/spark/pull/45427 ### What changes were proposed in this pull request? This PR changes the y/n message and condition consistent within merging script. ### Why are the changes needed? For consist

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
zwangsheng commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1517066704 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all th

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984911418 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45427: [MINOR][INFRA] Make "y/n" consistent within merge script URL: https://github.com/apache/spark/pull/45427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984918192 Late +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45415: [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method URL: https://github.com/apache/spark/pull/45415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45415: URL: https://github.com/apache/spark/pull/45415#issuecomment-1984919818 Thanks @zwangsheng, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984926315 Thank you @dongjoon-hyun. In such circumstances, I guess we can add a configuration for base64 classes to avoid breaking things again. AFAIK, Apache Hive also uses the JDK version

  1   2   >