[PR] [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2 [spark]

2024-05-20 Thread via GitHub
yaooqinn opened a new pull request, #46677: URL: https://github.com/apache/spark/pull/46677 ### What changes were proposed in this pull request? In this PR, we document the mapping rules for Spark SQL Data Types to DB2 ones ### Why are the changes needed?

[PR] [SPARK-48366][CORE][SQL][CONNECT] Simplify code statements related to `filter` [spark]

2024-05-20 Thread via GitHub
LuciferYang opened a new pull request, #46676: URL: https://github.com/apache/spark/pull/46676 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-20 Thread via GitHub
zhengruifeng commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1607672232 ## python/pyspark/sql/context.py: ## @@ -46,6 +46,7 @@ if TYPE_CHECKING: from py4j.java_gateway import JavaObject +import pyarrow as pa Review Comment

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
szehon-ho commented on PR #46673: URL: https://github.com/apache/spark/pull/46673#issuecomment-2121805760 Thanks! @dongjoon-hyun @sunchao can you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-19426][SQL] Custom coalescer for Dataset [spark]

2024-05-20 Thread via GitHub
SubhamSinghal commented on PR #46541: URL: https://github.com/apache/spark/pull/46541#issuecomment-2121803074 @hvanhovell will you be able to add review here or tag relevant folks here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on PR #46609: URL: https://github.com/apache/spark/pull/46609#issuecomment-2121794324 Merged to master. Thank you @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` [spark]

2024-05-20 Thread via GitHub
yaooqinn closed pull request #46609: [SPARK-48300][SQL] Codegen Support for `from_xml` URL: https://github.com/apache/spark/pull/46609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) [spark]

2024-05-20 Thread via GitHub
yaooqinn closed pull request #46663: [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) URL: https://github.com/apache/spark/pull/46663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on PR #46663: URL: https://github.com/apache/spark/pull/46663#issuecomment-2121790028 Thank you @Justontheway @HyukjinKwon, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [MINOR][TESTS] Rename test_union to test_eqnullsafe at ColumnTestsMixin [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #46675: URL: https://github.com/apache/spark/pull/46675#issuecomment-2121771596 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][TESTS] Rename test_union to test_eqnullsafe at ColumnTestsMixin [spark]

2024-05-20 Thread via GitHub
HyukjinKwon closed pull request #46675: [MINOR][TESTS] Rename test_union to test_eqnullsafe at ColumnTestsMixin URL: https://github.com/apache/spark/pull/46675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48359][SQL] Built-in functions for Zstd compression and decompression [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on PR #46672: URL: https://github.com/apache/spark/pull/46672#issuecomment-2121735788 Instead of adding (de)compression functions for different codecs, how about adding the `compression` and `decompression` directly, like, - https://dev.mysql.com/doc/refman/8.0/en/encr

Re: [PR] [CONNECT] Allow plugins to use QueryTest in their tests [spark]

2024-05-20 Thread via GitHub
zhengruifeng commented on PR #46667: URL: https://github.com/apache/spark/pull/46667#issuecomment-2121734784 the failed tests seems related: ``` [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/CatalogSuite.scala:29:28: illegal inh

[PR] [MINOR][TESTS] Rename test_union to test_eqnullsafe at ColumnTestsMixin [spark]

2024-05-20 Thread via GitHub
HyukjinKwon opened a new pull request, #46675: URL: https://github.com/apache/spark/pull/46675 ### What changes were proposed in this pull request? This PR proposes to rename `test_union` to `test_eqnullsafe` at `ColumnTestsMixin`. ### Why are the changes needed? To avoi

Re: [PR] [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-20 Thread via GitHub
AngersZh commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r1607575927 ## core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala: ## @@ -155,9 +158,9 @@ private[spark] class OutputCommitCoordinator(conf: SparkCo

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
superdiaodiao commented on PR #46673: URL: https://github.com/apache/spark/pull/46673#issuecomment-2121646393 > Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Revert

Re: [PR] [SPARK-48337][SQL] Fix precision loss for JDBC TIME values [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on PR #46662: URL: https://github.com/apache/spark/pull/46662#issuecomment-2121637292 Merged to master. Thank you @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48337][SQL] Fix precision loss for JDBC TIME values [spark]

2024-05-20 Thread via GitHub
yaooqinn closed pull request #46662: [SPARK-48337][SQL] Fix precision loss for JDBC TIME values URL: https://github.com/apache/spark/pull/46662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the doc [spark]

2024-05-20 Thread via GitHub
panbingkun commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1607534427 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## @@ -45,3 +45,29 @@ logger.error("Failed to abort the writer after failing to write map output

Re: [PR] [SPARK-48337][SQL] Fix precision loss for JDBC TIME values [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #46662: URL: https://github.com/apache/spark/pull/46662#issuecomment-2121629155 Thank you, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48337][SQL] Fix precision loss for JDBC TIME values [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on PR #46662: URL: https://github.com/apache/spark/pull/46662#issuecomment-2121627559 cc @cloud-fan @LuciferYang @dongjoon-hyun thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on code in PR #46037: URL: https://github.com/apache/spark/pull/46037#discussion_r1607514868 ## python/pyspark/sql/tests/connect/streaming/test_parity_listener.py: ## @@ -65,8 +66,140 @@ def onQueryTerminated(self, event): df.write.mode("append").s

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` [spark]

2024-05-20 Thread via GitHub
yaooqinn commented on code in PR #46609: URL: https://github.com/apache/spark/pull/46609#discussion_r1607505328 ## sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala: ## @@ -40,6 +40,17 @@ class XmlFunctionsSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` [spark]

2024-05-20 Thread via GitHub
panbingkun commented on PR #46609: URL: https://github.com/apache/spark/pull/46609#issuecomment-2121582579 It has rebase the master. At present, this PR is only for `codegen support for from_xml` @sandip-db @HyukjinKwon @yaooqinn @cloud-fan -- This is an automated message from the Apa

Re: [PR] [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` [spark]

2024-05-20 Thread via GitHub
HyukjinKwon closed pull request #46674: [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` URL: https://github.com/apache/spark/pull/46674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #46674: URL: https://github.com/apache/spark/pull/46674#issuecomment-2121561163 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-20 Thread via GitHub
mkaravel commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1607446081 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -288,13 +338,24 @@ private static int collationNameToId(String collatio

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV, XML, and JSON options [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1607447215 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -149,7 +149,13 @@ class CSVOptions( parameters.getOrElse(DateTimeUtils

Re: [PR] [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #46663: URL: https://github.com/apache/spark/pull/46663#issuecomment-2121501005 Mind taking a look at https://github.com/apache/spark/pull/46663/checks?check_run_id=25172009176? -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [CONNECT] Allow plugins to use QueryTest in their tests [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #46667: URL: https://github.com/apache/spark/pull/46667#issuecomment-2121499966 Can we file a JIRA ticket please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-48340][PYTHON] Support TimestampNTZ infer schema miss prefer_timestamp_ntz [spark]

2024-05-20 Thread via GitHub
HyukjinKwon closed pull request #4: [SPARK-48340][PYTHON] Support TimestampNTZ infer schema miss prefer_timestamp_ntz URL: https://github.com/apache/spark/pull/4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48340][PYTHON] Support TimestampNTZ infer schema miss prefer_timestamp_ntz [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #4: URL: https://github.com/apache/spark/pull/4#issuecomment-2121500193 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48329][SQL] Turn on `spark.sql.sources.v2.bucketing.pushPartValues.enabled` by default [spark]

2024-05-20 Thread via GitHub
superdiaodiao closed pull request #46650: [SPARK-48329][SQL] Turn on `spark.sql.sources.v2.bucketing.pushPartValues.enabled` by default URL: https://github.com/apache/spark/pull/46650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
superdiaodiao commented on PR #46673: URL: https://github.com/apache/spark/pull/46673#issuecomment-2121484021 > Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reve

Re: [PR] [SPARK-45579][CORE] Catch errors for FallbackStorage.copy [spark]

2024-05-20 Thread via GitHub
github-actions[bot] closed pull request #43409: [SPARK-45579][CORE] Catch errors for FallbackStorage.copy URL: https://github.com/apache/spark/pull/43409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45744][CORE] Switch `spark.history.store.serializer` to use `PROTOBUF` by default [spark]

2024-05-20 Thread via GitHub
github-actions[bot] commented on PR #43609: URL: https://github.com/apache/spark/pull/43609#issuecomment-2121473646 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-20 Thread via GitHub
HyukjinKwon closed pull request #46570: [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect URL: https://github.com/apache/spark/pull/46570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-20 Thread via GitHub
HyukjinKwon commented on PR #46570: URL: https://github.com/apache/spark/pull/46570#issuecomment-2121464274 Merged to master. I will followup the discussion if there are more to address since we're releasing preview soon. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` & remove some redundant codes [spark]

2024-05-20 Thread via GitHub
panbingkun commented on PR #46609: URL: https://github.com/apache/spark/pull/46609#issuecomment-2121447497 After the PR above is merged, I will rebase the PR again just for `codegen` Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` [spark]

2024-05-20 Thread via GitHub
panbingkun commented on PR #46674: URL: https://github.com/apache/spark/pull/46674#issuecomment-2121446653 cc @sandip-db @HyukjinKwon @cloud-fan @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` & remove some redundant codes [spark]

2024-05-20 Thread via GitHub
panbingkun commented on PR #46609: URL: https://github.com/apache/spark/pull/46609#issuecomment-2121446402 > @panbingkun Thanks for submitting the PR. Can you please separate the `codegen` support and the cleanup in separate PRs? Sure, A new separate PR for `cleanup`: https://github.co

[PR] [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` [spark]

2024-05-20 Thread via GitHub
panbingkun opened a new pull request, #46674: URL: https://github.com/apache/spark/pull/46674 ### What changes were proposed in this pull request? The PR aims to cleanup some redundant codes (support for `ArrayType` & `MapType`) ### Why are the changes needed? As discussed below

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-20 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1607401423 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -84,9 +109,157 @@ Define the reader logic to generate synthetic data. Use the `faker` library

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-20 Thread via GitHub
allisonwang-db commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1607398362 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -84,9 +101,158 @@ Define the reader logic to generate synthetic data. Use the `faker` library

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46589: URL: https://github.com/apache/spark/pull/46589#issuecomment-2121389075 Please update the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-20 Thread via GitHub
urosstan-db commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2121385385 @cloud-fan Sure, sorry, I did not see this failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46511: URL: https://github.com/apache/spark/pull/46511#issuecomment-2121384473 Please fill in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node [spark]

2024-05-20 Thread via GitHub
cloud-fan commented on PR #46617: URL: https://github.com/apache/spark/pull/46617#issuecomment-2121381949 > Repartition? Do you mean Relation? It's Reparition, because we rely on shuffle reuse to reuse CTE relations. -- This is an automated message from the Apache Git Service. To re

Re: [PR] [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1) [spark]

2024-05-20 Thread via GitHub
cloud-fan commented on PR #46637: URL: https://github.com/apache/spark/pull/46637#issuecomment-2121379656 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-20 Thread via GitHub
cloud-fan commented on PR #46280: URL: https://github.com/apache/spark/pull/46280#issuecomment-2121377372 I think so. String type with collation should be normal string type in the Hive table schema, so that other engines can still read it. We only keep the collation info in the Spark-speci

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
szehon-ho commented on PR #46673: URL: https://github.com/apache/spark/pull/46673#issuecomment-2121361991 Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reverted t

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-20 Thread via GitHub
cloud-fan commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2121361107 @urosstan-db can you re-trigger the test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
HeartSaVioR closed pull request #46651: [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval URL: https://github.com/apache/spark/pull/46651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
szehon-ho commented on code in PR #46673: URL: https://github.com/apache/spark/pull/46673#discussion_r1607370785 ## docs/sql-migration-guide.md: ## @@ -55,6 +55,7 @@ license: | - Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` has been changed from `

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-20 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2121332325 @uros-db I agree that we should avoid auxiliary structures. And I don't see a good way to move the changes to implementation of `merge` and `update` without keeping an auxiliary

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
HeartSaVioR commented on PR #46651: URL: https://github.com/apache/spark/pull/46651#issuecomment-2121324280 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
HeartSaVioR commented on PR #46651: URL: https://github.com/apache/spark/pull/46651#issuecomment-2121324009 Let's do post-review if there are remaining comments. Looks like the change is right and unavoidable. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48351] JDBC Connectors - Add cast suite and fix found issue [spark]

2024-05-20 Thread via GitHub
urosstan-db closed pull request #46669: [SPARK-48351] JDBC Connectors - Add cast suite and fix found issue URL: https://github.com/apache/spark/pull/46669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607348639 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607347956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -116,6 +116,78 @@ class AstBuilder extends DataTypeAstBuilder with SQLCon

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607339764 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/BatchLangLogicalOperators.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607338991 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607338855 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -116,6 +116,78 @@ class AstBuilder extends DataTypeAstBuilder with SQLCon

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1607338131 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,25 @@ options { tokenVocab = SqlBaseLexer; } public boolean doubl

Re: [PR] [WIP] Propagate column family information from executors to driver via Accumulator [spark]

2024-05-20 Thread via GitHub
ericm-db closed pull request #46644: [WIP] Propagate column family information from executors to driver via Accumulator URL: https://github.com/apache/spark/pull/46644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-44838][SQL][FOLLOW-UP] Fix the test for raise_error by using default type for strings [spark]

2024-05-20 Thread via GitHub
uros-db commented on PR #46649: URL: https://github.com/apache/spark/pull/46649#issuecomment-2121183739 fix should be ready https://github.com/apache/spark/pull/46661 please review @cloud-fan @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-48031] Decompose viewSchemaMode config, add SHOW CREATE TABLE support [spark]

2024-05-20 Thread via GitHub
gengliangwang commented on code in PR #46652: URL: https://github.com/apache/spark/pull/46652#discussion_r1607218983 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1700,15 +1700,21 @@ object SQLConf { .booleanConf .createWithDefault(

Re: [PR] [SPARK-48031] Decompose viewSchemaMode config, add SHOW CREATE TABLE support [spark]

2024-05-20 Thread via GitHub
gengliangwang commented on code in PR #46652: URL: https://github.com/apache/spark/pull/46652#discussion_r1607218705 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1700,15 +1700,21 @@ object SQLConf { .booleanConf .createWithDefault(

Re: [PR] [SPARK-43815][SQL] Wrap NPE with AnalysisException in CSV, XML, and JSON options [spark]

2024-05-20 Thread via GitHub
gengliangwang commented on code in PR #46626: URL: https://github.com/apache/spark/pull/46626#discussion_r1607213012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -107,7 +108,13 @@ class JSONOptions( val writeNullIfWithDefaultValue

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #46673: URL: https://github.com/apache/spark/pull/46673#issuecomment-2121080890 cc @viirya , @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on code in PR #46673: URL: https://github.com/apache/spark/pull/46673#discussion_r1607189545 ## sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala: ## @@ -1024,93 +1024,92 @@ class EnsureRequirementsSuite extends

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on code in PR #46673: URL: https://github.com/apache/spark/pull/46673#discussion_r1607189098 ## sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala: ## @@ -1169,7 +1169,6 @@ class KeyGroupedPartitioningSuite extends D

Re: [PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on code in PR #46673: URL: https://github.com/apache/spark/pull/46673#discussion_r1607188807 ## docs/sql-migration-guide.md: ## @@ -55,6 +55,7 @@ license: | - Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` has been changed fr

Re: [PR] [SPARK-48300][SQL] Codegen Support for `from_xml` & remove some redundant codes [spark]

2024-05-20 Thread via GitHub
sandip-db commented on PR #46609: URL: https://github.com/apache/spark/pull/46609#issuecomment-2121041342 @panbingkun Thanks for submitting the PR. Can you please separate the `codegen` support and the cleanup in separate PRs? -- This is an automated message from the Apache Git Service. T

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the doc [spark]

2024-05-20 Thread via GitHub
mridulm commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1607157027 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## @@ -45,3 +45,29 @@ logger.error("Failed to abort the writer after failing to write map output.",

Re: [PR] [SPARK-44838][SQL][FOLLOW-UP] Fix the test for raise_error by using default type for strings [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #46649: URL: https://github.com/apache/spark/pull/46649#issuecomment-2121013158 Gentle ping, @uros-db . The CI is still broken. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [spark]

2024-05-20 Thread via GitHub
szehon-ho opened a new pull request, #46673: URL: https://github.com/apache/spark/pull/46673 ### What changes were proposed in this pull request? Change 'spark.sql.sources.v2.bucketing.pushPartValues' to true for Spark 4.0 release ### Why are the changes needed? This flag ha

Re: [PR] [SPARK-48328][BUILD] Upgrade `Arrow` to 16.1.0 [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun closed pull request #46646: [SPARK-48328][BUILD] Upgrade `Arrow` to 16.1.0 URL: https://github.com/apache/spark/pull/46646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
chaoqin-li1123 commented on code in PR #46651: URL: https://github.com/apache/spark/pull/46651#discussion_r1607079327 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonStreamingSinkCommitRunner.scala: ## @@ -39,78 +35,22 @@ import org.apache.s

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
chaoqin-li1123 commented on code in PR #46651: URL: https://github.com/apache/spark/pull/46651#discussion_r1607078379 ## python/pyspark/sql/worker/python_streaming_sink_runner.py: ## @@ -34,22 +33,32 @@ _parse_datatype_json_string, StructType, ) -from pyspark.util imp

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the doc [spark]

2024-05-20 Thread via GitHub
gengliangwang commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1607076786 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## @@ -45,3 +45,29 @@ logger.error("Failed to abort the writer after failing to write map out

Re: [PR] [SPARK-48017] Add Spark application submission worker for operator [spark-kubernetes-operator]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #10: URL: https://github.com/apache/spark-kubernetes-operator/pull/10#issuecomment-2120919240 I wrote the current status summary here. - https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2120918277 -- This is an automated message fro

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2120918277 For the record, `submission` module is merged finally today with SPARK-48326 (TODO). - https://github.com/apache/spark-kubernetes-operator/pull/10 From Spark r

Re: [PR] [SPARK-48330][SS][PYTHON] Fix the python streaming data source timeout issue for large trigger interval [spark]

2024-05-20 Thread via GitHub
allisonwang-db commented on code in PR #46651: URL: https://github.com/apache/spark/pull/46651#discussion_r1607067727 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonStreamingSinkCommitRunner.scala: ## @@ -39,78 +35,22 @@ import org.apache.s

Re: [PR] [SPARK-48017] Add Spark application submission worker for operator [spark-kubernetes-operator]

2024-05-20 Thread via GitHub
dongjoon-hyun closed pull request #10: [SPARK-48017] Add Spark application submission worker for operator URL: https://github.com/apache/spark-kubernetes-operator/pull/10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-20 Thread via GitHub
nikolamand-db commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1606959835 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -118,76 +119,433 @@ public Collation( } /** - * Con

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-20 Thread via GitHub
nikolamand-db commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1606956472 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -118,76 +119,433 @@ public Collation( } /** - * Con

Re: [PR] [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation [spark]

2024-05-20 Thread via GitHub
dongjoon-hyun commented on PR #46611: URL: https://github.com/apache/spark/pull/46611#issuecomment-2120728328 Thank you, @pan3793 and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-20 Thread via GitHub
nikolamand-db commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1606952858 ## common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala: ## @@ -152,4 +219,218 @@ class CollationFactorySuite extends AnyFunSuit

Re: [PR] [SPARK-48352][SQL]set max file counter through spark conf [spark]

2024-05-20 Thread via GitHub
guixiaowen commented on PR #46668: URL: https://github.com/apache/spark/pull/46668#issuecomment-2120702247 @dongjoon-hyun Do you have time? Can you help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-48175][SQL][PYTHON] Store collation information in metadata and not in type for SER/DE [spark]

2024-05-20 Thread via GitHub
stefankandic commented on PR #46280: URL: https://github.com/apache/spark/pull/46280#issuecomment-2120702050 @cloud-fan I looked into HMS a bit, and it seems that we can't save column metadata there, so I guess we will still have to keep converting schema with collation to schema without wh

Re: [PR] [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing [spark]

2024-05-20 Thread via GitHub
stefanbuk-db commented on code in PR #46642: URL: https://github.com/apache/spark/pull/46642#discussion_r1606936510 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerPushdownIntegrationSuite.scala: ## @@ -0,0 +1,92 @@ +/* + * Licensed t

Re: [PR] [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing [spark]

2024-05-20 Thread via GitHub
stefanbuk-db commented on code in PR #46642: URL: https://github.com/apache/spark/pull/46642#discussion_r1606935984 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala: ## @@ -0,0 +1,387 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
dbatomic commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1606932943 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
dbatomic commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1606931359 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-48342] SQL Batch Lang Parser [spark]

2024-05-20 Thread via GitHub
dbatomic commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1606930293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala: ## @@ -62,4 +62,10 @@ trait ParserInterface extends DataTypeParserInterface

Re: [PR] [SPARK-48351] JDBC Connectors - Add cast suite and fix found issue [spark]

2024-05-20 Thread via GitHub
urosstan-db commented on code in PR #46669: URL: https://github.com/apache/spark/pull/46669#discussion_r1606929242 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -52,6 +52,27 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHe

Re: [PR] [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing [spark]

2024-05-20 Thread via GitHub
stefanbuk-db commented on code in PR #46642: URL: https://github.com/apache/spark/pull/46642#discussion_r1606926117 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -141,6 +160,9 @@ private case class MsSqlServerDialect() extends JdbcDialect

Re: [PR] [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing [spark]

2024-05-20 Thread via GitHub
stefanbuk-db commented on code in PR #46642: URL: https://github.com/apache/spark/pull/46642#discussion_r1606922807 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala: ## @@ -155,7 +162,8 @@ private case class PostgresDialect() extends JdbcDialect with S

Re: [PR] [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing [spark]

2024-05-20 Thread via GitHub
stefanbuk-db commented on code in PR #46642: URL: https://github.com/apache/spark/pull/46642#discussion_r1606917652 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySqlPushdownIntegrationSuite.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the

  1   2   3   >