Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604437739 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware UTF8String

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-16 Thread via GitHub
nikolamand-db commented on PR #46180: URL: https://github.com/apache/spark/pull/46180#issuecomment-2116855540 @mkaravel @dbatomic please review again, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub
gengliangwang closed pull request #46612: [SPARK-48303][CORE] Reorganize `LogKeys` URL: https://github.com/apache/spark/pull/46612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub
gengliangwang commented on PR #46612: URL: https://github.com/apache/spark/pull/46612#issuecomment-2116827152 Thanks for the improvement! Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46632: [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` URL: https://github.com/apache/spark/pull/46632 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46632: URL: https://github.com/apache/spark/pull/46632#issuecomment-2116752726 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48319][PYTHON][CONNECT][TESTS] Test `assert_true` and `raise_error` with the same error class as Spark Classic [spark]

2024-05-16 Thread via GitHub
zhengruifeng opened a new pull request, #46633: URL: https://github.com/apache/spark/pull/46633 ### What changes were proposed in this pull request? Test `assert_true` and `raise_error` with the same error class as Spark Classic ### Why are the changes needed? https://githu

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604352238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48312][SQL] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46622: URL: https://github.com/apache/spark/pull/46622#discussion_r1604351546 ## sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala: ## @@ -49,6 +49,9 @@ sealed class Metadata private[types] (private[types] val map: Map[String, A

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604346112 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46631: [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` URL: https://github.com/apache/spark/pull/46631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46631: URL: https://github.com/apache/spark/pull/46631#issuecomment-2116583644 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub
zhengruifeng commented on PR #46630: URL: https://github.com/apache/spark/pull/46630#issuecomment-2116579906 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub
zhengruifeng closed pull request #46630: [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` URL: https://github.com/apache/spark/pull/46630 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-05-16 Thread via GitHub
TakawaAkirayo commented on PR #46182: URL: https://github.com/apache/spark/pull/46182#issuecomment-2116573625 Gently ping @grundprinzip if anything else needs to be provided from my side :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub
pan3793 commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116553658 > ... supposedly the workaround will be removed when the Yarn side upgrades their J2EE? I suppose not. According to the discussion in https://github.com/apache/spark/pull/31642 `o

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub
HiuKwok commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116550519 The patch makes sense to me, and supposedly the workaround will be removed when the Yarn side upgrades their J2EE? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46629: [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition URL: https://github.com/apache/spark/pull/46629 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46629: URL: https://github.com/apache/spark/pull/46629#issuecomment-2116548415 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1604289643 ## python/pyspark/sql/connect/dataframe.py: ## @@ -104,6 +107,8 @@ class DataFrame(ParentDataFrame): +_release_thread_pool: Optional[ThreadPool] = ThreadPo

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-16 Thread via GitHub
zhengruifeng commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1604267324 ## python/pyspark/sql/connect/dataframe.py: ## @@ -104,6 +107,8 @@ class DataFrame(ParentDataFrame): +_release_thread_pool: Optional[ThreadPool] = ThreadP

[PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46632: URL: https://github.com/apache/spark/pull/46632 ### What changes were proposed in this pull request? This PR proposes to enable the tests `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file`. ### Why are t

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub
zml1206 commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2116500596 `with` is a good idea, thank you very much @cloud-fan . Close it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub
zml1206 closed pull request #46499: [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit URL: https://github.com/apache/spark/pull/46499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48306][SQL] Improve UDT in error message [spark]

2024-05-16 Thread via GitHub
yaooqinn commented on PR #46616: URL: https://github.com/apache/spark/pull/46616#issuecomment-2116498446 Merged to master. Thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48306][SQL] Improve UDT in error message [spark]

2024-05-16 Thread via GitHub
yaooqinn closed pull request #46616: [SPARK-48306][SQL] Improve UDT in error message URL: https://github.com/apache/spark/pull/46616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-48113][CONNECT] Allow Plugins to integrate with Spark Connect [spark]

2024-05-16 Thread via GitHub
nchammas commented on PR #46364: URL: https://github.com/apache/spark/pull/46364#issuecomment-2116496398 Looks like the docs I am looking for are in #45340. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46631: URL: https://github.com/apache/spark/pull/46631 ### What changes were proposed in this pull request? This PR fixes the test `test_apply_schema_to_row` to call `test_apply_schema_to_row` instead of `test_apply_schema_to_dict_and_rows`. It

Re: [PR] [SPARK-48301][SQL][FOLLOWUP] Update the error message [spark]

2024-05-16 Thread via GitHub
zhengruifeng commented on PR #46628: URL: https://github.com/apache/spark/pull/46628#issuecomment-2116489481 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48301][SQL][FOLLOWUP] Update the error message [spark]

2024-05-16 Thread via GitHub
zhengruifeng closed pull request #46628: [SPARK-48301][SQL][FOLLOWUP] Update the error message URL: https://github.com/apache/spark/pull/46628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] [SPARK-41625][PYTHON][CONNECT][TESTS] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46630: URL: https://github.com/apache/spark/pull/46630 ### What changes were proposed in this pull request? This PR proposes to enable `DataFrameObservationParityTests.test_observe_str`. ### Why are the changes needed? To make sure

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46629: URL: https://github.com/apache/spark/pull/46629#issuecomment-2116480337 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48316][PS][CONNECT][TESTS] Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46629: URL: https://github.com/apache/spark/pull/46629 ### What changes were proposed in this pull request? This PR proposes to enable `SparkFrameMethodsParityTests.test_coalesce` and `SparkFrameMethodsParityTests.test_repartition` in Spark Conn

Re: [PR] [SPARK-48251][BUILD] Disable `maven local cache` on GA's step `MIMA test` [spark]

2024-05-16 Thread via GitHub
panbingkun commented on PR #46551: URL: https://github.com/apache/spark/pull/46551#issuecomment-2116479316 1.with `maven local cache` https://github.com/panbingkun/spark/actions/runs/9109019204 https://github.com/apache/spark/assets/15246973/439b6397-21bb-427d-a740-481755b54a02";>

Re: [PR] [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46621: [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies URL: https://github.com/apache/spark/pull/46621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46621: URL: https://github.com/apache/spark/pull/46621#issuecomment-2116465949 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #46622: URL: https://github.com/apache/spark/pull/46622#discussion_r1604233127 ## sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala: ## @@ -49,6 +49,9 @@ sealed class Metadata private[types] (private[types] val map: Map[String,

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV options [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1604232424 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -149,7 +149,12 @@ class CSVOptions( parameters.getOrElse(DateTimeUtils

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV options [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1604232182 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -149,7 +149,12 @@ class CSVOptions( parameters.getOrElse(DateTimeUtils

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub
panbingkun commented on PR #46612: URL: https://github.com/apache/spark/pull/46612#issuecomment-2116460422 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48314] Don't double cache files for FileStreamSource using Trigger.AvailableNow [spark]

2024-05-16 Thread via GitHub
Kimahriman commented on PR #46627: URL: https://github.com/apache/spark/pull/46627#issuecomment-2116427914 @HeartSaVioR since you added the file caching originally back in the day -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] [SPARK-48314] Don't double cache files for FileStreamSource using Trigger.AvailableNow [spark]

2024-05-16 Thread via GitHub
Kimahriman opened a new pull request, #46627: URL: https://github.com/apache/spark/pull/46627 ### What changes were proposed in this pull request? Files don't need to be cached for reuse in `FileStreamSource` when using `Trigger.AvailableNow` because all files are already cach

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub
zhengruifeng commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1604205064 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either N

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-16 Thread via GitHub
github-actions[bot] closed pull request #45001: [WIP][Spark 44646] Reduce usage of log4j core URL: https://github.com/apache/spark/pull/45001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-42789][SQL] Rewrite multiple GetJsonObject that consumes same JSON to single JsonTuple [spark]

2024-05-16 Thread via GitHub
github-actions[bot] closed pull request #45020: [SPARK-42789][SQL] Rewrite multiple GetJsonObject that consumes same JSON to single JsonTuple URL: https://github.com/apache/spark/pull/45020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46571: [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir URL: https://github.com/apache/spark/pull/46571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46571: URL: https://github.com/apache/spark/pull/46571#issuecomment-2116375998 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46614: [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test URL: https://github.com/apache/spark/pull/46614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test [spark]

2024-05-16 Thread via GitHub
HyukjinKwon commented on PR #46614: URL: https://github.com/apache/spark/pull/46614#issuecomment-2116375642 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub
dependabot[bot] commented on PR #46625: URL: https://github.com/apache/spark/pull/46625#issuecomment-2116374205 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub
HyukjinKwon closed pull request #46625: Bump rexml from 3.2.6 to 3.2.8 in /docs URL: https://github.com/apache/spark/pull/46625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-43815] Wrap NPE with AnalysisException in csv options [spark]

2024-05-16 Thread via GitHub
michaelzhan-db opened a new pull request, #46626: URL: https://github.com/apache/spark/pull/46626 ### What changes were proposed in this pull request? When user sets `locale` to be `null`, a NPE is raised. Instead, replace the NPE with a more understandable user facing error m

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub
gengliangwang commented on PR #46623: URL: https://github.com/apache/spark/pull/46623#issuecomment-2116274470 @michaelzhan-db there are merge conflicts against branch-3.5. Please create a new PR for the backport. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub
gengliangwang closed pull request #46623: [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError URL: https://github.com/apache/spark/pull/46623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub
gengliangwang commented on PR #46623: URL: https://github.com/apache/spark/pull/46623#issuecomment-2116272913 Thanks, merging to master and branch 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub
dependabot[bot] opened a new pull request, #46625: URL: https://github.com/apache/spark/pull/46625 Bumps [rexml](https://github.com/ruby/rexml) from 3.2.6 to 3.2.8. Release notes Sourced from https://github.com/ruby/rexml/releases";>rexml's releases. REXML 3.2.8 - 2024-05-16

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-16 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1603979413 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -59,8 +59,17 @@ Start by creating a new subclass of :class:`DataSource`. Define the source na

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1603972872 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1317,6 +1317,16 @@ package object config { s" be less than or equal to ${ByteA

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub
mridulm commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116089785 +CC @tgravescs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Python ds preview [spark]

2024-05-16 Thread via GitHub
chaoqin-li1123 closed pull request #46624: Python ds preview URL: https://github.com/apache/spark/pull/46624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Python ds preview [spark]

2024-05-16 Thread via GitHub
chaoqin-li1123 opened a new pull request, #46624: URL: https://github.com/apache/spark/pull/46624 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub
srielau commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1603902920 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either NOSCAN

Re: [PR] [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite* [spark]

2024-05-16 Thread via GitHub
gengliangwang closed pull request #46615: [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite* URL: https://github.com/apache/spark/pull/46615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite* [spark]

2024-05-16 Thread via GitHub
gengliangwang commented on PR #46615: URL: https://github.com/apache/spark/pull/46615#issuecomment-2115976061 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-16 Thread via GitHub
allisonwang-db commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1602361195 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -33,9 +33,15 @@ To create a custom Python data source, you'll need to subclass the :class:`Da

Re: [PR] [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node [spark]

2024-05-16 Thread via GitHub
amaliujia commented on code in PR #46617: URL: https://github.com/apache/spark/pull/46617#discussion_r1603865663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -74,34 +70,33 @@ case class InlineCTE(alwaysInline: Boolean = false) ext

[PR] [SPARK-48294] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub
michaelzhan-db opened a new pull request, #46623: URL: https://github.com/apache/spark/pull/46623 ### What changes were proposed in this pull request? Handle lowercase values inside of nestTypeMissingElementTypeError to prevent match errors. ### Why are the changes need

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46589: URL: https://github.com/apache/spark/pull/46589#discussion_r1603752983 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -278,47 +431,29 @@ public static UTF8String lowercaseSubStringIn

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603716669 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware UTF8Strin

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603733508 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala: ## @@ -712,4 +714,181 @@ class DataTypeSuite extends SparkFunSuite { assert(r

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603732173 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alp

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub
allisonwang-db commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1603731266 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603727528 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -36,11 +36,45 @@ * Provides functionality to the UTF8String object

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub
agubichev commented on PR #46622: URL: https://github.com/apache/spark/pull/46622#issuecomment-2115745170 @cloud-fan PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603651070 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dat

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub
vladimirg-db commented on PR #46622: URL: https://github.com/apache/spark/pull/46622#issuecomment-2115620509 @agubichev, hi! Here's the improvement fix for the ultra-wide views -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub
nebojsa-db commented on PR #46618: URL: https://github.com/apache/spark/pull/46618#issuecomment-2115535153 @cloud-fan Please review :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and… [spark]

2024-05-16 Thread via GitHub
guixiaowen opened a new pull request, #46620: URL: https://github.com/apache/spark/pull/46620 … retries may not be successful ### What changes were proposed in this pull request? In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our production environme

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub
PetarVasiljevic-DB commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603513988 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -125,6 +126,29 @@ class OracleIntegrati

Re: [PR] [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-05-16 Thread via GitHub
cloud-fan closed pull request #46619: [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy URL: https://github.com/apache/spark/pull/46619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on PR #46619: URL: https://github.com/apache/spark/pull/46619#issuecomment-2115430404 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603428926 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -125,6 +126,29 @@ class OracleIntegrationSuite e

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603426755 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala: ## @@ -102,4 +105,7 @@ class DB2IntegrationSuite extends

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603420522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603406813 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala: ## @@ -712,4 +714,181 @@ class DataTypeSuite extends SparkFunSuite { assert(resu

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603405105 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dataTy

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603404296 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dataTy

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603399289 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alphab

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603393035 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alphab

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603384345 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -110,6 +158,8 @@ public Collation( // No Collation can simultane

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603382702 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -36,11 +36,45 @@ * Provides functionality to the UTF8String object wh

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603381135 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1051,6 +1052,153 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPl

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603377917 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603376505 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603374449 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -397,7 +398,11 @@ trait JoinSelectionHelper extends Logging { protected

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603367540 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationKey.scala: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2115258124 I think https://github.com/apache/spark/pull/45802#issuecomment-2101762336 is a better idea. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-48305][SQL] Add collation support for CurrentLike expressions [spark]

2024-05-16 Thread via GitHub
uros-db commented on PR #46613: URL: https://github.com/apache/spark/pull/46613#issuecomment-2115241390 @cloud-fan ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub
zhengruifeng commented on PR #46608: URL: https://github.com/apache/spark/pull/46608#issuecomment-2115184716 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub
zhengruifeng closed pull request #46608: [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` URL: https://github.com/apache/spark/pull/46608 -- This is an automated message from the Apache Git Service. To respond to the me

  1   2   >