Re: [PR] [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin [spark]

2024-06-07 Thread via GitHub
JoshRosen closed pull request #46908: [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin URL: https://github.com/apache/spark/pull/46908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin [spark]

2024-06-07 Thread via GitHub
JoshRosen commented on PR #46908: URL: https://github.com/apache/spark/pull/46908#issuecomment-2154434517 I'm switching this back to WIP pending some further performance profiling over a wider array of workloads. It turns out that the Scala compiler generates specialized

Re: [PR] [WIP][SPARK-47103][ML] Make the default storage level of intermediate datasets for MLlib configurable [spark]

2024-06-06 Thread via GitHub
github-actions[bot] closed pull request #45182: [WIP][SPARK-47103][ML] Make the default storage level of intermediate datasets for MLlib configurable URL: https://github.com/apache/spark/pull/45182 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-06-06 Thread via GitHub
uros-db commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1628887306 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -174,10 +174,10 @@ class CollationExpressionSuite

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-06-05 Thread via GitHub
mkaravel commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2151433489 > As noted in one of the resolved comments - there's no loss of coverage. However, some tests have been (temporarily) removed because `StringTrim` no longer supports UNICODE collation

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1628787249 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -174,10 +174,10 @@ class CollationExpressionSuite

Re: [PR] [WIP][SPARK-47103][ML] Make the default storage level of intermediate datasets for MLlib configurable [spark]

2024-06-05 Thread via GitHub
github-actions[bot] commented on PR #45182: URL: https://github.com/apache/spark/pull/45182#issuecomment-2151159015 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2147009357 ``` "grpc-default-executor-1" #82 daemon prio=5 os_prio=31 cpu=9.92ms elapsed=678.54s tid=0x000128b9c800 nid=0xe10b runnable [0x00030cdbc000] java.lang.Thread.State:

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun closed pull request #45326: [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 URL: https://github.com/apache/spark/pull/45326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2146999566 I will close it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2146955610 is this a bug of log4j 2.23.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2146952585 https://github.com/apache/spark/assets/15246973/20b90f92-da0a-44e7-9f22-47117394894f;> -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2146913844 ``` "pool-31-thread-1" #62 prio=5 os_prio=31 cpu=1.84ms elapsed=681.43s tid=0x00013c5dac00 nid=0xe303 waiting for monitor entry [0x00030cbb5000]

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-06-04 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-2146842740 Make a record of where hang: ``` q.processAllAvailable() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [WIP][SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-05-31 Thread via GitHub
WweiL commented on PR #46819: URL: https://github.com/apache/spark/pull/46819#issuecomment-2142598236 tagging myself @WweiL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
yaooqinn closed pull request #46806: [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects URL: https://github.com/apache/spark/pull/46806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139508692 @yaooqinn is this considered green? All CIs that were run passed, but most of them were skipped. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620668890 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620662883 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620663435 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
yaooqinn commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139427708 Thank you for the update. would you mind to raise another PR to target master branch?Then, if CI is green here, we can merge that one with backport s together. -- This is an

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139410598 @yaooqinn Switched base branch in new PR. Does this PR description seem plausible? -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46806: URL: https://github.com/apache/spark/pull/46806 ### What changes were proposed in this pull request? Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`. ### Why are the changes needed?

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db closed pull request #46803: [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects URL: https://github.com/apache/spark/pull/46803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
yaooqinn commented on PR #46803: URL: https://github.com/apache/spark/pull/46803#issuecomment-2139375639 Can you change the target branch to 3.5 and elaborate a bit more about the issue and solution? -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46803: URL: https://github.com/apache/spark/pull/46803 ### What changes were proposed in this pull request? Fix for daily job failure on 3.4 branch. ### Why are the changes needed? Tests failing. ### Does this PR introduce

[PR] [WIP][SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46801: URL: https://github.com/apache/spark/pull/46801 ### What changes were proposed in this pull request? Addition of a test. ### Why are the changes needed? Collations introduced a lot of changes to many functions and this test aims

[PR] [WIP][SPARK-47415][SQL] Add collation support for Levenshtein expression [spark]

2024-05-29 Thread via GitHub
uros-db opened a new pull request, #46788: URL: https://github.com/apache/spark/pull/46788 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2136550324 As noted in one of the resolved comments - there's no loss of coverage. However, some tests have been (temporarily) removed because `StringTrim` no longer supports UNICODE collation

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1618200968 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -994,20 +996,6 @@ public void testStringTrim() throws SparkException {

Re: [PR] [WIP] [SPARK-48442] [PYSPARK] Add parenthesis to awaitTermination [spark]

2024-05-28 Thread via GitHub
riyaverm-db commented on PR #46779: URL: https://github.com/apache/spark/pull/46779#issuecomment-2136209502 @chaoqin-li1123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
mkaravel commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1617945708 ## common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala: ## @@ -48,7 +48,7 @@ class CollationFactorySuite extends AnyFunSuite with

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-05-28 Thread via GitHub
pkotikalapudi commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2135858753 > hey guys, which version will this land in? We have to get reviews and approvals from PMC members and our Sheperd (@HeartSaVioR ) before set a timeline on when it can be

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2134820952 We can leave `PandasMode` for a separate PR, but we'll definitely need to take care of it at one point now that you've explored various options and finished the `groupMapReduce`

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2134806618 another note for future reference: as per [ICU docs](https://unicode-org.github.io/icu/userguide/collation/concepts.html), we've decided to stick with **TERTIARY** collation strength for

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1616816500 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -245,12 +245,12 @@ public CollationIdentifier identifier() {

[PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db opened a new pull request, #46772: URL: https://github.com/apache/spark/pull/46772 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-27 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1615951326 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-26 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2132241154 @uros-db ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-05-26 Thread via GitHub
stym06 commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2132173624 hey guys, which version will this land in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) [spark]

2024-05-24 Thread via GitHub
GideonPotok closed pull request #46404: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) URL: https://github.com/apache/spark/pull/46404 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419 @uros-db I forgot but should I add collation support to `org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`? The only difference will be 1. Support for null

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613363048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613361676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613351378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes:

[PR] [WIP][SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations [spark]

2024-05-24 Thread via GitHub
uros-db opened a new pull request, #46732: URL: https://github.com/apache/spark/pull/46732 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612994451 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612993851 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612949455 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612937162 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612936409 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612933590 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-23 Thread via GitHub
mkaravel commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612654646 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

[PR] [WIP][SPARK-48318][SQL] Enable hash join support for all collations (complex types) [spark]

2024-05-23 Thread via GitHub
uros-db opened a new pull request, #46722: URL: https://github.com/apache/spark/pull/46722 ### What changes were proposed in this pull request? Enable collation support for hash join on complex types. - Logical plan is rewritten in analysis to (recursively) replace all

Re: [PR] [WIP][SPARK-47353][SQL][Prototype of alternative algorithm] Enable collation support for the Mode expression using multiple experimental approaches [spark]

2024-05-23 Thread via GitHub
GideonPotok closed pull request #46488: [WIP][SPARK-47353][SQL][Prototype of alternative algorithm] Enable collation support for the Mode expression using multiple experimental approaches URL: https://github.com/apache/spark/pull/46488 -- This is an automated message from the Apache Git

Re: [PR] [WIP][SPARK-48397][SQL] Add data write time metric to FileFormatDataWriter [spark]

2024-05-23 Thread via GitHub
jiwen624 commented on code in PR #46714: URL: https://github.com/apache/spark/pull/46714#discussion_r1611925610 ## sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala: ## @@ -837,7 +837,8 @@ class SQLMetricsSuite extends SharedSparkSession with

[PR] [WIP][SPARK-48397][SQL] Add data write time metric to FileFormatDataWriter [spark]

2024-05-23 Thread via GitHub
jiwen624 opened a new pull request, #46714: URL: https://github.com/apache/spark/pull/46714 ### What changes were proposed in this pull request? For FileFormatDataWriter we currently record metrics of "task commit time" and "job commit time" in

[PR] [WIP][SPARK-48392][CORE] Also load `spark-defaults.conf` when provided `--properties-file` [spark]

2024-05-22 Thread via GitHub
sunchao opened a new pull request, #46709: URL: https://github.com/apache/spark/pull/46709 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-22 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1610561627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2123419758 @uros-db I have made changes for all but your latest suggestion (re whitelists -- will add that soon) -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608866089 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608865653 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608247812 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,18 +102,56 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608240127 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608242763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608240127 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608236375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

[PR] [WIP][SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-21 Thread via GitHub
uros-db opened a new pull request, #46682: URL: https://github.com/apache/spark/pull/46682 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46589: URL: https://github.com/apache/spark/pull/46589#issuecomment-2121389075 Please update the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46511: URL: https://github.com/apache/spark/pull/46511#issuecomment-2121384473 Please fill in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-20 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2121332325 @uros-db I agree that we should avoid auxiliary structures. And I don't see a good way to move the changes to implementation of `merge` and `update` without keeping an auxiliary

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-19 Thread via GitHub
GideonPotok closed pull request #46526: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce URL: https://github.com/apache/spark/pull/46526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [WIP] [spark]

2024-05-17 Thread via GitHub
ericm-db opened a new pull request, #46644: URL: https://github.com/apache/spark/pull/46644 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47227][FOLLOW][DOCS] Building Extensions [spark]

2024-05-17 Thread via GitHub
nchammas commented on code in PR #45340: URL: https://github.com/apache/spark/pull/45340#discussion_r1605122743 ## docs/spark-connect-extending.md: ## @@ -0,0 +1,248 @@ +--- +layout: global +title: Extending Spark Connect with Custom Functionality +license: | + Licensed to the

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604485097 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -709,12 +774,24 @@ public void testLocate() throws SparkException {

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604482878 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -709,12 +774,24 @@ public void testLocate() throws SparkException {

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603739103 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604439268 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604437739 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604352238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604346112 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-16 Thread via GitHub
github-actions[bot] closed pull request #45001: [WIP][Spark 44646] Reduce usage of log4j core URL: https://github.com/apache/spark/pull/45001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46589: URL: https://github.com/apache/spark/pull/46589#discussion_r1603752983 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -278,47 +431,29 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603716669 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603420522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603381135 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1051,6 +1052,153 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603377917 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603376505 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603374449 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -397,7 +398,11 @@ trait JoinSelectionHelper extends Logging { protected

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603367540 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationKey.scala: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation

[PR] [WIP][SPARK-48298][Core] Add TCP mode to StatsD sink [spark]

2024-05-15 Thread via GitHub
jiwen624 opened a new pull request, #46604: URL: https://github.com/apache/spark/pull/46604 ### What changes were proposed in this pull request? Working on it... ### Why are the changes needed? As mentioned in the Jira ticket: https://issues.apache.org/jira/browse/SPARK-48298

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-15 Thread via GitHub
github-actions[bot] commented on PR #45001: URL: https://github.com/apache/spark/pull/45001#issuecomment-2113687236 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-15 Thread via GitHub
uros-db opened a new pull request, #46599: URL: https://github.com/apache/spark/pull/46599 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [V2] [spark]

2024-05-15 Thread via GitHub
GideonPotok opened a new pull request, #46597: URL: https://github.com/apache/spark/pull/46597 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubStringIndex) [spark]

2024-05-15 Thread via GitHub
uros-db opened a new pull request, #46589: URL: https://github.com/apache/spark/pull/46589 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
GideonPotok commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600668920 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -22,7 +22,7 @@ import

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600495650 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600495650 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

  1   2   3   4   5   6   7   8   >