Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568326394 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -354,9 +355,13 @@ private[sql] class HDFSBackedSt

Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

2024-04-17 Thread via GitHub
pspoerri commented on PR #34864: URL: https://github.com/apache/spark/pull/34864#issuecomment-2060527524 @steveloughran How do I call the Hue APIs from Spark? Can you point me to a package? I agree with you that using the Hadoop APIs are not ideal performance wise, but they are great fro

[PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic opened a new pull request, #46096: URL: https://github.com/apache/spark/pull/46096 ### What changes were proposed in this pull request? This PR proposes to enhance "Installation" page to cover all installable options for PySpark pip installation. ### Why are the cha

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060538992 cc @HyukjinKwon @ueshin @zhengruifeng @xinrong-meng @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568344903 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,92 @@ To install PySpark from source, refer to |building_spark|_. Dependencies --

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568352067 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala: ## @@ -274,7 +275,8 @@ class FileSystemBasedCheckpointFileManager(pa

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering.

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering.

[PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db opened a new pull request, #46097: URL: https://github.com/apache/spark/pull/46097 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568360558 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -63,83 +68,122 @@ object LogKey extends Enumeration { val CSV_SCHEMA_FIELD_NAME = Val

Re: [PR] [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on PR #46057: URL: https://github.com/apache/spark/pull/46057#issuecomment-2060575614 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on PR #45946: URL: https://github.com/apache/spark/pull/45946#issuecomment-2060579504 Yeah forgot to block it. Will create a followup to add that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering.

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568305287 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -100,6 +100,90 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46071: URL: https://github.com/apache/spark/pull/46071#discussion_r1568380607 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -766,6 +769,17 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379883 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46071: URL: https://github.com/apache/spark/pull/46071#issuecomment-2060603814 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379265 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46071: [SPARK-47867][SQL] Support variant in JSON scan. URL: https://github.com/apache/spark/pull/46071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568388242 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2060623468 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #44971: URL: https://github.com/apache/spark/pull/44971#discussion_r1568388171 ## docs/util/build-error-docs.py: ## @@ -0,0 +1,151 @@ +""" +Generate a unified page of documentation for all error conditions. +""" +import json +import os +import r

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46011: [SPARK-47821][SQL] Implement is_variant_null expression URL: https://github.com/apache/spark/pull/46011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46017: URL: https://github.com/apache/spark/pull/46017#issuecomment-2060634388 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568393760 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering.

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46017: [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type URL: https://github.com/apache/spark/pull/46017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568399026 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,92 @@ To install PySpark from source, refer to |building_spark|_. Dependencies +

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060645900 Add more dependencies and updated screen capture from PR description accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568400477 ## python/pyspark/sql/column.py: ## @@ -175,46 +175,13 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]], "Column"

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568406504 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l, final

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379883 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379265 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568413626 ## python/pyspark/sql/column.py: ## @@ -175,46 +175,13 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]], "Column"

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568413994 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568413581 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -99,7 +99,10 @@ public static boolean execLowercase(final UTF8String

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2060677068 The vote passed. - https://lists.apache.org/thread/4cbkpvc3vr3b6k0wp6lgsw37spdpnqrc Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apac

Re: [PR] [SPARK-44444][SQL] Use ANSI SQL mode by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun closed pull request #46013: [SPARK-4][SQL] Use ANSI SQL mode by default URL: https://github.com/apache/spark/pull/46013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [WIP][SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-17 Thread via GitHub
xi-db opened a new pull request, #46098: URL: https://github.com/apache/spark/pull/46098 ### What changes were proposed in this pull request? In [the previous PR](https://github.com/apache/spark/pull/46012), we cache plans in AnalyzePlan requests. We're also enabling it for ExecutePla

Re: [PR] [SPARK-44444][SQL] Use ANSI SQL mode by default [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2060686852 Since ANSI is on by default, shall we remove the daily ANSI test job? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun opened a new pull request, #46099: URL: https://github.com/apache/spark/pull/46099 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568438791 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -99,7 +99,10 @@ public static boolean execLowercase(final UTF8String l,

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2060694544 Could you review this, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568444549 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060705481 cc @zhengruifeng and @WeichenXu123 would you mind reviewing this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] [SPARK-47885][PYTHON][CONNECT] Make pyspark.resource compatible with pyspark-connect [spark]

2024-04-17 Thread via GitHub
HyukjinKwon opened a new pull request, #46100: URL: https://github.com/apache/spark/pull/46100 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.resource` compatible with `pyspark-connect`. ### Why are the changes needed? In order for

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568479342 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568479784 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568481821 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485463 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568486652 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568483558 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568484563 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485173 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485937 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568486468 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

[PR] [SPARK-47883][SQL] Make CollectTailExec.doExecute lazy with RowQueue [spark]

2024-04-17 Thread via GitHub
zhengruifeng opened a new pull request, #46101: URL: https://github.com/apache/spark/pull/46101 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was th

[PR] [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values [spark]

2024-04-17 Thread via GitHub
yaooqinn opened a new pull request, #46102: URL: https://github.com/apache/spark/pull/46102 ### What changes were proposed in this pull request? This PR added tests and doc for Postgres special numeric values. Postgres supports special numeric values "NaN", "infinity

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568525373 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568487322 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490894 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568488642 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

[PR] [FOLLOWUP][SPARK-47765] Disable SET COLLATION when collations are disabled [spark]

2024-04-17 Thread via GitHub
mihailom-db opened a new pull request, #46103: URL: https://github.com/apache/spark/pull/46103 ### What changes were proposed in this pull request? Disable SET COLLATION when collations are diabled. ### Why are the changes needed? We do not want users to use syntax that is no

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490593 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568600521 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490051 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies -

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568431472 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression, rep

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607999 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils { DataType.equalsIgnoreCaseAnd

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568564309 ## python/pyspark/errors/utils.py: ## @@ -119,3 +127,74 @@ def get_message_template(self, error_class: str) -> str: message_template = main_message_tem

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568616448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils { DataType.equalsIgnoreCase

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568584880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression, r

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47883][SQL] Make CollectTailExec.doExecute lazy with RowQueue [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46101: URL: https://github.com/apache/spark/pull/46101#discussion_r1568624495 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -118,18 +118,52 @@ case class CollectLimitExec(limit: Int = -1, child: SparkPlan, offs

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568627121 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends Ben

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568630997 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-Core Pro

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568631300 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-Co

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568631652 ## sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568635378 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends Ben

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568638560 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -130,6 +215,9 @@ object CollationBenchmark extends CollationBenchmar

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568638593 ## sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC 776

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568639177 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568645699 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false }

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568652210 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression, rep

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568652156 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataTy

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607199 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false }

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
HyukjinKwon closed pull request #46099: [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job URL: https://github.com/apache/spark/pull/46099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568629092 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -36,18 +36,19 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568622375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568662503 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class SoundEx(c

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568664744 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -564,6 +564,31 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHe

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568613066 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class Sou

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568665548 ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -125,23 +128,26 @@ class SparkThrowableSuite extends SparkFunSuite { s"Error classes

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568666443 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/EncoderUtils.scala: ## @@ -77,6 +77,7 @@ object EncoderUtils { case _: DecimalType => class

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568644785 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2060916645 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568599038 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class Sou

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568665456 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -564,6 +564,31 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHe

  1   2   3   >