date:20240516

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub

uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604437739 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware UTF8String

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-16 Thread via GitHub

nikolamand-db commented on PR #46180: URL: https://github.com/apache/spark/pull/46180#issuecomment-2116855540 @mkaravel @dbatomic please review again, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub

gengliangwang closed pull request #46612: [SPARK-48303][CORE] Reorganize `LogKeys` URL: https://github.com/apache/spark/pull/46612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub

gengliangwang commented on PR #46612: URL: https://github.com/apache/spark/pull/46612#issuecomment-2116827152 Thanks for the improvement! Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46632: [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` URL: https://github.com/apache/spark/pull/46632 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46632: URL: https://github.com/apache/spark/pull/46632#issuecomment-2116752726 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48319][PYTHON][CONNECT][TESTS] Test `assert_true` and `raise_error` with the same error class as Spark Classic [spark]

2024-05-16 Thread via GitHub

zhengruifeng opened a new pull request, #46633: URL: https://github.com/apache/spark/pull/46633 ### What changes were proposed in this pull request? Test `assert_true` and `raise_error` with the same error class as Spark Classic ### Why are the changes needed? https://githu

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604352238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48312][SQL] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46622: URL: https://github.com/apache/spark/pull/46622#discussion_r1604351546 ## sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala: ## @@ -49,6 +49,9 @@ sealed class Metadata private[types] (private[types] val map: Map[String, A

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604346112 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46631: [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` URL: https://github.com/apache/spark/pull/46631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46631: URL: https://github.com/apache/spark/pull/46631#issuecomment-2116583644 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub

zhengruifeng commented on PR #46630: URL: https://github.com/apache/spark/pull/46630#issuecomment-2116579906 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub

zhengruifeng closed pull request #46630: [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str` URL: https://github.com/apache/spark/pull/46630 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-05-16 Thread via GitHub

TakawaAkirayo commented on PR #46182: URL: https://github.com/apache/spark/pull/46182#issuecomment-2116573625 Gently ping @grundprinzip if anything else needs to be provided from my side :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub

pan3793 commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116553658 > ... supposedly the workaround will be removed when the Yarn side upgrades their J2EE? I suppose not. According to the discussion in https://github.com/apache/spark/pull/31642 `o

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub

HiuKwok commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116550519 The patch makes sense to me, and supposedly the workaround will be removed when the Yarn side upgrades their J2EE? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46629: [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition URL: https://github.com/apache/spark/pull/46629 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46629: URL: https://github.com/apache/spark/pull/46629#issuecomment-2116548415 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1604289643 ## python/pyspark/sql/connect/dataframe.py: ## @@ -104,6 +107,8 @@ class DataFrame(ParentDataFrame): +_release_thread_pool: Optional[ThreadPool] = ThreadPo

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-16 Thread via GitHub

zhengruifeng commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1604267324 ## python/pyspark/sql/connect/dataframe.py: ## @@ -104,6 +107,8 @@ class DataFrame(ParentDataFrame): +_release_thread_pool: Optional[ThreadPool] = ThreadP

[PR] [SPARK-48317][PYTHON][CONNECT][TESTS] Enable `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon opened a new pull request, #46632: URL: https://github.com/apache/spark/pull/46632 ### What changes were proposed in this pull request? This PR proposes to enable the tests `test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file`. ### Why are t

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub

zml1206 commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2116500596 `with` is a good idea, thank you very much @cloud-fan . Close it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub

zml1206 closed pull request #46499: [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit URL: https://github.com/apache/spark/pull/46499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48306][SQL] Improve UDT in error message [spark]

2024-05-16 Thread via GitHub

yaooqinn commented on PR #46616: URL: https://github.com/apache/spark/pull/46616#issuecomment-2116498446 Merged to master. Thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48306][SQL] Improve UDT in error message [spark]

2024-05-16 Thread via GitHub

yaooqinn closed pull request #46616: [SPARK-48306][SQL] Improve UDT in error message URL: https://github.com/apache/spark/pull/46616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-48113][CONNECT] Allow Plugins to integrate with Spark Connect [spark]

2024-05-16 Thread via GitHub

nchammas commented on PR #46364: URL: https://github.com/apache/spark/pull/46364#issuecomment-2116496398 Looks like the docs I am looking for are in #45340. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [MINOR][PYTHON][TESTS] Call `test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon opened a new pull request, #46631: URL: https://github.com/apache/spark/pull/46631 ### What changes were proposed in this pull request? This PR fixes the test `test_apply_schema_to_row` to call `test_apply_schema_to_row` instead of `test_apply_schema_to_dict_and_rows`. It

Re: [PR] [SPARK-48301][SQL][FOLLOWUP] Update the error message [spark]

2024-05-16 Thread via GitHub

zhengruifeng commented on PR #46628: URL: https://github.com/apache/spark/pull/46628#issuecomment-2116489481 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48301][SQL][FOLLOWUP] Update the error message [spark]

2024-05-16 Thread via GitHub

zhengruifeng closed pull request #46628: [SPARK-48301][SQL][FOLLOWUP] Update the error message URL: https://github.com/apache/spark/pull/46628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] [SPARK-41625][PYTHON][CONNECT][TESTS] Enable `DataFrameObservationParityTests.test_observe_str` [spark]

2024-05-16 Thread via GitHub

HyukjinKwon opened a new pull request, #46630: URL: https://github.com/apache/spark/pull/46630 ### What changes were proposed in this pull request? This PR proposes to enable `DataFrameObservationParityTests.test_observe_str`. ### Why are the changes needed? To make sure

Re: [PR] [SPARK-48316][PS][CONNECT][TESTS] Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46629: URL: https://github.com/apache/spark/pull/46629#issuecomment-2116480337 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48316][PS][CONNECT][TESTS] Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition [spark]

2024-05-16 Thread via GitHub

HyukjinKwon opened a new pull request, #46629: URL: https://github.com/apache/spark/pull/46629 ### What changes were proposed in this pull request? This PR proposes to enable `SparkFrameMethodsParityTests.test_coalesce` and `SparkFrameMethodsParityTests.test_repartition` in Spark Conn

Re: [PR] [SPARK-48251][BUILD] Disable `maven local cache` on GA's step `MIMA test` [spark]

2024-05-16 Thread via GitHub

panbingkun commented on PR #46551: URL: https://github.com/apache/spark/pull/46551#issuecomment-2116479316 1.with `maven local cache` https://github.com/panbingkun/spark/actions/runs/9109019204 https://github.com/apache/spark/assets/15246973/439b6397-21bb-427d-a740-481755b54a02";>

Re: [PR] [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46621: [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies URL: https://github.com/apache/spark/pull/46621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46621: URL: https://github.com/apache/spark/pull/46621#issuecomment-2116465949 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on code in PR #46622: URL: https://github.com/apache/spark/pull/46622#discussion_r1604233127 ## sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala: ## @@ -49,6 +49,9 @@ sealed class Metadata private[types] (private[types] val map: Map[String,

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV options [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1604232424 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -149,7 +149,12 @@ class CSVOptions( parameters.getOrElse(DateTimeUtils

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV options [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1604232182 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -149,7 +149,12 @@ class CSVOptions( parameters.getOrElse(DateTimeUtils

Re: [PR] [SPARK-48303][CORE] Reorganize `LogKeys` [spark]

2024-05-16 Thread via GitHub

panbingkun commented on PR #46612: URL: https://github.com/apache/spark/pull/46612#issuecomment-2116460422 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48314] Don't double cache files for FileStreamSource using Trigger.AvailableNow [spark]

2024-05-16 Thread via GitHub

Kimahriman commented on PR #46627: URL: https://github.com/apache/spark/pull/46627#issuecomment-2116427914 @HeartSaVioR since you added the file caching originally back in the day -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] [SPARK-48314] Don't double cache files for FileStreamSource using Trigger.AvailableNow [spark]

2024-05-16 Thread via GitHub

Kimahriman opened a new pull request, #46627: URL: https://github.com/apache/spark/pull/46627 ### What changes were proposed in this pull request? Files don't need to be cached for reuse in `FileStreamSource` when using `Trigger.AvailableNow` because all files are already cach

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub

zhengruifeng commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1604205064 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either N

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-16 Thread via GitHub

github-actions[bot] closed pull request #45001: [WIP][Spark 44646] Reduce usage of log4j core URL: https://github.com/apache/spark/pull/45001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-42789][SQL] Rewrite multiple GetJsonObject that consumes same JSON to single JsonTuple [spark]

2024-05-16 Thread via GitHub

github-actions[bot] closed pull request #45020: [SPARK-42789][SQL] Rewrite multiple GetJsonObject that consumes same JSON to single JsonTuple URL: https://github.com/apache/spark/pull/45020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46571: [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir URL: https://github.com/apache/spark/pull/46571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46571: URL: https://github.com/apache/spark/pull/46571#issuecomment-2116375998 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46614: [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test URL: https://github.com/apache/spark/pull/46614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test [spark]

2024-05-16 Thread via GitHub

HyukjinKwon commented on PR #46614: URL: https://github.com/apache/spark/pull/46614#issuecomment-2116375642 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub

dependabot[bot] commented on PR #46625: URL: https://github.com/apache/spark/pull/46625#issuecomment-2116374205 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub

HyukjinKwon closed pull request #46625: Bump rexml from 3.2.6 to 3.2.8 in /docs URL: https://github.com/apache/spark/pull/46625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-43815] Wrap NPE with AnalysisException in csv options [spark]

2024-05-16 Thread via GitHub

michaelzhan-db opened a new pull request, #46626: URL: https://github.com/apache/spark/pull/46626 ### What changes were proposed in this pull request? When user sets `locale` to be `null`, a NPE is raised. Instead, replace the NPE with a more understandable user facing error m

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub

gengliangwang commented on PR #46623: URL: https://github.com/apache/spark/pull/46623#issuecomment-2116274470 @michaelzhan-db there are merge conflicts against branch-3.5. Please create a new PR for the backport. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub

gengliangwang closed pull request #46623: [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError URL: https://github.com/apache/spark/pull/46623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub

gengliangwang commented on PR #46623: URL: https://github.com/apache/spark/pull/46623#issuecomment-2116272913 Thanks, merging to master and branch 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Bump rexml from 3.2.6 to 3.2.8 in /docs [spark]

2024-05-16 Thread via GitHub

dependabot[bot] opened a new pull request, #46625: URL: https://github.com/apache/spark/pull/46625 Bumps [rexml](https://github.com/ruby/rexml) from 3.2.6 to 3.2.8. Release notes Sourced from https://github.com/ruby/rexml/releases";>rexml's releases. REXML 3.2.8 - 2024-05-16

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-16 Thread via GitHub

chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1603979413 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -59,8 +59,17 @@ Start by creating a new subclass of :class:`DataSource`. Define the source na

Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub

mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1603972872 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1317,6 +1317,16 @@ package object config { s" be less than or equal to ${ByteA

Re: [PR] [SPARK-48238][BUILD][YARN] Replace AmIpFilter with re-implemented YarnAMIpFilter [spark]

2024-05-16 Thread via GitHub

mridulm commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2116089785 +CC @tgravescs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Python ds preview [spark]

2024-05-16 Thread via GitHub

chaoqin-li1123 closed pull request #46624: Python ds preview URL: https://github.com/apache/spark/pull/46624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Python ds preview [spark]

2024-05-16 Thread via GitHub

chaoqin-li1123 opened a new pull request, #46624: URL: https://github.com/apache/spark/pull/46624 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub

srielau commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1603902920 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either NOSCAN

Re: [PR] [SPARK-48291][CORE][FOLLOWUP] Rename Java LoggerSuite as SparkLoggerSuite [spark]

2024-05-16 Thread via GitHub

gengliangwang closed pull request #46615: [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite* URL: https://github.com/apache/spark/pull/46615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48291][CORE][FOLLOWUP] Rename Java LoggerSuite as SparkLoggerSuite [spark]

2024-05-16 Thread via GitHub

gengliangwang commented on PR #46615: URL: https://github.com/apache/spark/pull/46615#issuecomment-2115976061 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-16 Thread via GitHub

allisonwang-db commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1602361195 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -33,9 +33,15 @@ To create a custom Python data source, you'll need to subclass the :class:`Da

Re: [PR] [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node [spark]

2024-05-16 Thread via GitHub

amaliujia commented on code in PR #46617: URL: https://github.com/apache/spark/pull/46617#discussion_r1603865663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -74,34 +70,33 @@ case class InlineCTE(alwaysInline: Boolean = false) ext

[PR] [SPARK-48294] Handle lowercase in nestedTypeMissingElementTypeError [spark]

2024-05-16 Thread via GitHub

michaelzhan-db opened a new pull request, #46623: URL: https://github.com/apache/spark/pull/46623 ### What changes were proposed in this pull request? Handle lowercase values inside of nestTypeMissingElementTypeError to prevent match errors. ### Why are the changes need

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-16 Thread via GitHub

mkaravel commented on code in PR #46589: URL: https://github.com/apache/spark/pull/46589#discussion_r1603752983 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -278,47 +431,29 @@ public static UTF8String lowercaseSubStringIn

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub

mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603716669 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware UTF8Strin

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603733508 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala: ## @@ -712,4 +714,181 @@ class DataTypeSuite extends SparkFunSuite { assert(r

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603732173 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alp

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub

allisonwang-db commented on code in PR #46608: URL: https://github.com/apache/spark/pull/46608#discussion_r1603731266 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2675,9 +2675,9 @@ "ANALYZE TABLE(S) ... COMPUTE STATISTICS ... must be either

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603727528 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -36,11 +36,45 @@ * Provides functionality to the UTF8String object

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub

agubichev commented on PR #46622: URL: https://github.com/apache/spark/pull/46622#issuecomment-2115745170 @cloud-fan PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

stefankandic commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603651070 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dat

Re: [PR] [SQL][SPARK-48312] Improve Alias.removeNonInheritableMetadata performance [spark]

2024-05-16 Thread via GitHub

vladimirg-db commented on PR #46622: URL: https://github.com/apache/spark/pull/46622#issuecomment-2115620509 @agubichev, hi! Here's the improvement fix for the ultra-wide views -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48159][SQL] Extending support for collated strings on datetime expressions [spark]

2024-05-16 Thread via GitHub

nebojsa-db commented on PR #46618: URL: https://github.com/apache/spark/pull/46618#issuecomment-2115535153 @cloud-fan Please review :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and… [spark]

2024-05-16 Thread via GitHub

guixiaowen opened a new pull request, #46620: URL: https://github.com/apache/spark/pull/46620 … retries may not be successful ### What changes were proposed in this pull request? In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our production environme

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub

PetarVasiljevic-DB commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603513988 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -125,6 +126,29 @@ class OracleIntegrati

Re: [PR] [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-05-16 Thread via GitHub

cloud-fan closed pull request #46619: [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy URL: https://github.com/apache/spark/pull/46619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48308][Core] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on PR #46619: URL: https://github.com/apache/spark/pull/46619#issuecomment-2115430404 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603428926 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -125,6 +126,29 @@ class OracleIntegrationSuite e

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1603426755 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala: ## @@ -102,4 +105,7 @@ class DB2IntegrationSuite extends

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603420522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603406813 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala: ## @@ -712,4 +714,181 @@ class DataTypeSuite extends SparkFunSuite { assert(resu

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603405105 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dataTy

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603404296 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -63,7 +66,61 @@ case class StructField( ("name" -> name) ~ ("type" -> dataTy

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603399289 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alphab

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603393035 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -208,22 +206,35 @@ object DataType { } // NOTE: Map fields must be sorted in alphab

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603384345 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -110,6 +158,8 @@ public Collation( // No Collation can simultane

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1603382702 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -36,11 +36,45 @@ * Provides functionality to the UTF8String object wh

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603381135 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1051,6 +1052,153 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPl

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603377917 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603376505 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603374449 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -397,7 +398,11 @@ trait JoinSelectionHelper extends Logging { protected

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub

dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603367540 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationKey.scala: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-16 Thread via GitHub

cloud-fan commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2115258124 I think https://github.com/apache/spark/pull/45802#issuecomment-2101762336 is a better idea. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-48305][SQL] Add collation support for CurrentLike expressions [spark]

2024-05-16 Thread via GitHub

uros-db commented on PR #46613: URL: https://github.com/apache/spark/pull/46613#issuecomment-2115241390 @cloud-fan ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub

zhengruifeng commented on PR #46608: URL: https://github.com/apache/spark/pull/46608#issuecomment-2115184716 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` [spark]

2024-05-16 Thread via GitHub

zhengruifeng closed pull request #46608: [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE` URL: https://github.com/apache/spark/pull/46608 -- This is an automated message from the Apache Git Service. To respond to the me

1 2 >

1 - 100 of 135 matches

Mail list logo