(incubator-gluten) branch main updated: [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference (#5143)
This is an automated email from the ASF dual-hosted git repository. ulyssesyou pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 3bc5387c0 [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference (#5143) 3bc5387c0 is described below commit 3bc5387c0e50f3e012f6ffad55dabbb7c52229c9 Author: Nicholas Jiang AuthorDate: Wed Mar 27 13:44:40 2024 +0800 [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference (#5143) --- docs/get-started/ClickHouse.md | 8 docs/get-started/Velox.md | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/get-started/ClickHouse.md b/docs/get-started/ClickHouse.md index ad7183b90..4167af2ee 100644 --- a/docs/get-started/ClickHouse.md +++ b/docs/get-started/ClickHouse.md @@ -629,14 +629,14 @@ public read-only account:gluten/hN2xX3uQ4m ### Celeborn support -Gluten with clickhouse backend has not yet supportted [Celeborn](https://github.com/apache/incubator-celeborn) natively as remote shuffle service using columar shuffle. However, you can still use Celeborn with row shuffle, which means a ColumarBatch will be converted to a row during shuffle. +Gluten with clickhouse backend has not yet supportted [Celeborn](https://github.com/apache/celeborn) natively as remote shuffle service using columar shuffle. However, you can still use Celeborn with row shuffle, which means a ColumarBatch will be converted to a row during shuffle. Below introduction is used to enable this feature: -First refer to this URL(https://github.com/apache/incubator-celeborn) to setup a celeborn cluster. +First refer to this URL(https://github.com/apache/celeborn) to setup a celeborn cluster. Then add the Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`). -- Celeborn: celeborn-client-spark-3-shaded_2.12-0.3.0-incubating.jar +- Celeborn: celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar Currently to use Celeborn following configurations are required in `spark-defaults.conf` @@ -666,7 +666,7 @@ spark.sql.adaptive.localShuffleReader.enabled false spark.celeborn.storage.hdfs.dir hdfs:///celeborn # If you want to use dynamic resource allocation, -# please refer to this URL (https://github.com/apache/incubator-celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark. +# please refer to this URL (https://github.com/apache/celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark. spark.dynamicAllocation.enabled false ``` diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md index 7c3d77abc..1fabfc0fe 100644 --- a/docs/get-started/Velox.md +++ b/docs/get-started/Velox.md @@ -203,11 +203,11 @@ Currently there are several ways to asscess S3 in Spark. Please refer [Velox S3] ## Celeborn support -Gluten with velox backend supports [Celeborn](https://github.com/apache/incubator-celeborn) as remote shuffle service. Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`. +Gluten with velox backend supports [Celeborn](https://github.com/apache/celeborn) as remote shuffle service. Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`. Below introduction is used to enable this feature -First refer to this URL(https://github.com/apache/incubator-celeborn) to setup a celeborn cluster. +First refer to this URL(https://github.com/apache/celeborn) to setup a celeborn cluster. When compiling the Gluten Java module, it's required to enable `rss` profile, as follows: @@ -217,7 +217,7 @@ mvn clean package -Pbackends-velox -Pspark-3.3 -Prss -DskipTests Then add the Gluten and Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`). -- Celeborn: celeborn-client-spark-3-shaded_2.12-0.3.0-incubating.jar +- Celeborn: celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar - Gluten: gluten-velox-bundle-spark3.x_2.12-xx_xx_xx-SNAPSHOT.jar, gluten-thirdparty-lib-xx-xx.jar Currently to use Gluten following configurations are required in `spark-defaults.conf` @@ -248,7 +248,7 @@ spark.sql.adaptive.localShuffleReader.enabled false spark.celeborn.storage.hdfs.dir hdfs:///celeborn # If you want to use dynamic resource allocation, -# please refer to this URL (https://github.com/apache/incubator-celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark. +# please refer to this URL (https://github.com/apache/celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark. spark.dynamicAllocation.enabled false ``` - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] Remove Incubating of Celeborn from reference [incubator-gluten]
ulysses-you closed issue #5142: Remove Incubating of Celeborn from reference URL: https://github.com/apache/incubator-gluten/issues/5142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference [incubator-gluten]
ulysses-you merged PR #5143: URL: https://github.com/apache/incubator-gluten/pull/5143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Velox patch to avoid installing libunwind-dev no longer works [incubator-gluten]
PHILO-HE commented on PR #5127: URL: https://github.com/apache/incubator-gluten/pull/5127#issuecomment-2021997304 Sorry for late response. Looks good! Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference [incubator-gluten]
github-actions[bot] commented on PR #5143: URL: https://github.com/apache/incubator-gluten/pull/5143#issuecomment-2021995102 https://github.com/apache/incubator-gluten/issues/5142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
ulysses-you commented on PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141#issuecomment-2021974259 I see, thank you for the explaination! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[I] Remove Incubating of Celeborn from reference [incubator-gluten]
SteNicholas opened a new issue, #5142: URL: https://github.com/apache/incubator-gluten/issues/5142 ### Description The ASF board has approved a resolution to graduate Celeborn into a full Top Level Project. Incubating of Celeborn should be removed from reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-4964][CORE]Fallback complex data type in parquet write for Spark32 & Spark33 [incubator-gluten]
github-actions[bot] commented on PR #5107: URL: https://github.com/apache/incubator-gluten/pull/5107#issuecomment-2021963550 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
zhztheplayer commented on PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141#issuecomment-2021963407 > Thank you @zhztheplayer for the quick fix. After this pr if there is no c2r, the duplicate keys issue is still existed right ? After the fix is applied we should no longer have any relevant issues on BHJ unless unknown. The issue this PR tried to fix only happened when broadcast exchange is fallen back but bhj is not. Which is a corner case for current Gluten, usually they are both fallen back or both not. Thus ideally we shouldn't have this issue in usual bhj processing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
ulysses-you commented on PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141#issuecomment-2021947563 Thank you @zhztheplayer for the quick fix. After this pr if there is no c2r, the duplicate keys issue is still existed right ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Basic runnable version of ACBO (Advanced CBO) [incubator-gluten]
github-actions[bot] commented on PR #5058: URL: https://github.com/apache/incubator-gluten/pull/5058#issuecomment-2021943260 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
zhouyuan commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2021938997 The feature itself looks good to me CC @PHILO-HE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Add uniffle integration [incubator-gluten]
github-actions[bot] commented on PR #3767: URL: https://github.com/apache/incubator-gluten/pull/3767#issuecomment-2021937111 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion (#5141)
This is an automated email from the ASF dual-hosted git repository. hongze pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new e4fe9baec [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion (#5141) e4fe9baec is described below commit e4fe9baeccde07e2938d5f186151c43591e91720 Author: Hongze Zhang AuthorDate: Wed Mar 27 12:54:29 2024 +0800 [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion (#5141) --- .../apache/spark/sql/execution/BroadcastUtils.scala| 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/backends-velox/src/main/scala/org/apache/spark/sql/execution/BroadcastUtils.scala b/backends-velox/src/main/scala/org/apache/spark/sql/execution/BroadcastUtils.scala index a0f28c5ab..ad7694ea2 100644 --- a/backends-velox/src/main/scala/org/apache/spark/sql/execution/BroadcastUtils.scala +++ b/backends-velox/src/main/scala/org/apache/spark/sql/execution/BroadcastUtils.scala @@ -26,7 +26,7 @@ import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.UnsafeRow import org.apache.spark.sql.catalyst.plans.physical.{BroadcastMode, BroadcastPartitioning, IdentityBroadcastMode, Partitioning} -import org.apache.spark.sql.execution.joins.{HashedRelation, HashedRelationBroadcastMode} +import org.apache.spark.sql.execution.joins.{HashedRelation, HashedRelationBroadcastMode, LongHashedRelation} import org.apache.spark.sql.types.StructType import org.apache.spark.sql.vectorized.ColumnarBatch import org.apache.spark.util.TaskResources @@ -96,9 +96,8 @@ object BroadcastUtils { // HashedRelation to ColumnarBuildSideRelation. val fromBroadcast = from.asInstanceOf[Broadcast[HashedRelation]] val fromRelation = fromBroadcast.value.asReadOnlyCopy() -val keys = fromRelation.keys() val toRelation = TaskResources.runUnsafe { - val batchItr: Iterator[ColumnarBatch] = fn(keys.flatMap(key => fromRelation.get(key))) + val batchItr: Iterator[ColumnarBatch] = fn(reconstructRows(fromRelation)) val serialized: Array[Array[Byte]] = serializeStream(batchItr) match { case ColumnarBatchSerializeResult.EMPTY => Array() @@ -170,4 +169,17 @@ object BroadcastUtils { } serializeResult } + + private def reconstructRows(relation: HashedRelation): Iterator[InternalRow] = { +// It seems that LongHashedRelation and UnsafeHashedRelation don't follow the same +// criteria while getting values from them. +// Should review the internals of this part of code. +relation match { + case relation: LongHashedRelation if relation.keyIsUnique => +relation.keys().map(k => relation.getValue(k)) + case relation: LongHashedRelation if !relation.keyIsUnique => +relation.keys().flatMap(k => relation.get(k)) + case other => other.valuesWithKeyIndex().map(_.getValue) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
zhztheplayer merged PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
zhztheplayer commented on PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141#issuecomment-2021931432 cc @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
zhouyuan commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2021930759 Hi @dcoliversun This patch seems trying to generate package for each OS, the package built from centos7 should be to work on other platforms as it's using static packaging via vcpkg. Can you please to check the centos7 package can work on your case? thanks, -yuan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]
zhztheplayer commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021919347 The major issue I have found is that the `flatMap` approach would cause `UnsafeHashedRelation` to produce duplicated rows in my case (TPCDS q14a with current version of ACBO) While the `map` approach would cause `LongHashedRelation` to loss rows (TPCDS q2). The following fix (the same with #5141) can work but I didn't dive into it deeply to find the root reason of the inconsistency (maybe related to `keyIsUnique`? I am not sure). ```scala private def reconstructRows(relation: HashedRelation): Iterator[InternalRow] = { // It seems that LongHashedRelation and UnsafeHashedRelation don't follow the same // criteria while getting values from them. // Should review the internals of this part of code. relation match { case relation: LongHashedRelation if relation.keyIsUnique => relation.keys().map(k => relation.getValue(k)) case relation: LongHashedRelation if !relation.keyIsUnique => relation.keys().flatMap(k => relation.get(k)) case other => other.valuesWithKeyIndex().map(_.getValue) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]
zhztheplayer commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021916024 I don't have dedicated UTs for it so it was incorporated into the other PR. Still I can open one for it if you think it's needed: https://github.com/apache/incubator-gluten/pull/5141. The change was already tested so I will proceed to merge after code style check is passed if it's OK to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable from_utc_timestamp Spark function [incubator-gluten]
github-actions[bot] commented on PR #5140: URL: https://github.com/apache/incubator-gluten/pull/5140#issuecomment-2021909505 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion [incubator-gluten]
github-actions[bot] commented on PR #5141: URL: https://github.com/apache/incubator-gluten/pull/5141#issuecomment-2021911803 https://github.com/apache/incubator-gluten/issues/5136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable from_utc_timestamp Spark function [incubator-gluten]
github-actions[bot] commented on PR #5140: URL: https://github.com/apache/incubator-gluten/pull/5140#issuecomment-2021909730 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [CORE] Enable from_utc_timestamp Spark function [incubator-gluten]
acvictor opened a new pull request, #5140: URL: https://github.com/apache/incubator-gluten/pull/5140 ## What changes were proposed in this pull request? Enable from_utc_timestamp Spark function ## How was this patch tested? Added UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable to_utc_timestamp Spark function [incubator-gluten]
github-actions[bot] commented on PR #5139: URL: https://github.com/apache/incubator-gluten/pull/5139#issuecomment-2021897306 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable to_utc_timestamp Spark function [incubator-gluten]
acvictor commented on PR #5139: URL: https://github.com/apache/incubator-gluten/pull/5139#issuecomment-2021894818 @PHILO-HE can you please review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [CORE] Enable to_utc_timestamp Spark function [incubator-gluten]
acvictor opened a new pull request, #5139: URL: https://github.com/apache/incubator-gluten/pull/5139 ## What changes were proposed in this pull request? Enable to_utc_timestamp ## How was this patch tested? Added a UT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable to_utc_timestamp Spark function [incubator-gluten]
github-actions[bot] commented on PR #5139: URL: https://github.com/apache/incubator-gluten/pull/5139#issuecomment-2021894673 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable to_utc_timestamp Spark function [incubator-gluten]
github-actions[bot] commented on PR #5139: URL: https://github.com/apache/incubator-gluten/pull/5139#issuecomment-2021894438 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [DNM] Velox test [incubator-gluten]
github-actions[bot] commented on PR #4929: URL: https://github.com/apache/incubator-gluten/pull/4929#issuecomment-2021890936 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]
ulysses-you commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021873724 Thank you @zhztheplayer It's a good point, columnar broadcast would broadcast the origin binary data but vanilla Spark would broadcast hash relation. So I think this issue is a common case even if there is no r2c. Is it possbile to create a new pr for this issue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Add uniffle integration [incubator-gluten]
github-actions[bot] commented on PR #3767: URL: https://github.com/apache/incubator-gluten/pull/3767#issuecomment-2021856789 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [GLUTEN-5083][CH] Invalid result with mergeTwoPhasesHashBaseAggregateIfNeed enable (#5137)
This is an automated email from the ASF dual-hosted git repository. zhangzc pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new d3e4f2e4d [GLUTEN-5083][CH] Invalid result with mergeTwoPhasesHashBaseAggregateIfNeed enable (#5137) d3e4f2e4d is described below commit d3e4f2e4dea31b8f19e1cf86772cf34c1688d364 Author: lgbo AuthorDate: Wed Mar 27 11:15:26 2024 +0800 [GLUTEN-5083][CH] Invalid result with mergeTwoPhasesHashBaseAggregateIfNeed enable (#5137) [CH] Invalid result with mergeTwoPhasesHashBaseAggregateIfNeed enable --- cpp-ch/local-engine/Operator/GraceMergingAggregatedStep.cpp | 4 cpp-ch/local-engine/Operator/StreamingAggregatingStep.cpp | 4 cpp-ch/local-engine/Parser/AggregateRelParser.cpp | 2 +- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/cpp-ch/local-engine/Operator/GraceMergingAggregatedStep.cpp b/cpp-ch/local-engine/Operator/GraceMergingAggregatedStep.cpp index 9294d9719..00d2e3116 100644 --- a/cpp-ch/local-engine/Operator/GraceMergingAggregatedStep.cpp +++ b/cpp-ch/local-engine/Operator/GraceMergingAggregatedStep.cpp @@ -67,6 +67,10 @@ GraceMergingAggregatedStep::GraceMergingAggregatedStep( void GraceMergingAggregatedStep::transformPipeline(DB::QueryPipelineBuilder & pipeline, const DB::BuildQueryPipelineSettings &) { +if (params.max_bytes_before_external_group_by) +{ +throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "max_bytes_before_external_group_by is not supported in GraceMergingAggregatedStep"); +} auto num_streams = pipeline.getNumStreams(); auto transform_params = std::make_shared(pipeline.getHeader(), params, true); pipeline.resize(1); diff --git a/cpp-ch/local-engine/Operator/StreamingAggregatingStep.cpp b/cpp-ch/local-engine/Operator/StreamingAggregatingStep.cpp index ff81ee294..698d353b1 100644 --- a/cpp-ch/local-engine/Operator/StreamingAggregatingStep.cpp +++ b/cpp-ch/local-engine/Operator/StreamingAggregatingStep.cpp @@ -286,6 +286,10 @@ StreamingAggregatingStep::StreamingAggregatingStep( void StreamingAggregatingStep::transformPipeline(DB::QueryPipelineBuilder & pipeline, const DB::BuildQueryPipelineSettings &) { +if (params.max_bytes_before_external_group_by) +{ +throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "max_bytes_before_external_group_by is not supported in StreamingAggregatingStep"); +} pipeline.dropTotalsAndExtremes(); auto transform_params = std::make_shared(pipeline.getHeader(), params, false); pipeline.resize(1); diff --git a/cpp-ch/local-engine/Parser/AggregateRelParser.cpp b/cpp-ch/local-engine/Parser/AggregateRelParser.cpp index 02248d74a..a3ab329f0 100644 --- a/cpp-ch/local-engine/Parser/AggregateRelParser.cpp +++ b/cpp-ch/local-engine/Parser/AggregateRelParser.cpp @@ -310,7 +310,7 @@ void AggregateRelParser::addCompleteModeAggregatedStep() settings.group_by_overflow_mode, settings.group_by_two_level_threshold, settings.group_by_two_level_threshold_bytes, -settings.max_bytes_before_external_group_by, +0, /*settings.max_bytes_before_external_group_by*/ settings.empty_result_for_aggregation_by_empty_set, getContext()->getTempDataOnDisk(), settings.max_threads, - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5083][CH] Invalid result with `mergeTwoPhasesHashBaseAggregateIfNeed` enable [incubator-gluten]
zzcclp merged PR #5137: URL: https://github.com/apache/incubator-gluten/pull/5137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [CH] Invalid result with `mergeTwoPhasesHashBaseAggregateIfNeed` enable [incubator-gluten]
zzcclp closed issue #5083: [CH] Invalid result with `mergeTwoPhasesHashBaseAggregateIfNeed` enable URL: https://github.com/apache/incubator-gluten/issues/5083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [CORE] Support JDK17 (#5120)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 7942701c3 [CORE] Support JDK17 (#5120) 7942701c3 is described below commit 7942701c3b67c72230f34286f837c3a6f13fd002 Author: Xiduo You AuthorDate: Wed Mar 27 11:10:11 2024 +0800 [CORE] Support JDK17 (#5120) * Support JDK17 * address comment - Co-authored-by: Kent Yao --- .github/workflows/velox_docker.yml | 114 ++--- docs/developers/NewToGluten.md | 12 docs/get-started/Velox.md | 28 - pom.xml| 47 ++- tools/gluten-it/pom.xml| 23 +++- tools/gluten-it/sbin/gluten-it.sh | 21 ++- 6 files changed, 152 insertions(+), 93 deletions(-) diff --git a/.github/workflows/velox_docker.yml b/.github/workflows/velox_docker.yml index f2b73e81d..6329750d2 100644 --- a/.github/workflows/velox_docker.yml +++ b/.github/workflows/velox_docker.yml @@ -73,6 +73,17 @@ jobs: matrix: os: ["ubuntu:20.04", "ubuntu:22.04"] spark: ["spark-3.2", "spark-3.3", "spark-3.4", "spark-3.5"] +java: [ "java-8", "java-17" ] +# Spark supports JDK17 since 3.3 and later, see https://issues.apache.org/jira/browse/SPARK-33772 +exclude: + - spark: spark-3.2 +java: java-17 + - spark: spark-3.4 +java: java-17 + - spark: spark-3.5 +java: java-17 + - os: ubuntu:22.04 +java: java-17 runs-on: ubuntu-20.04 container: ${{ matrix.os }} steps: @@ -84,69 +95,45 @@ jobs: path: ./cpp/build/releases - name: Setup java and maven run: | - apt-get update && \ - apt-get install -y openjdk-8-jdk maven && \ + if [ "${{ matrix.java }}" = "java-17" ]; then +apt-get update && apt-get install -y openjdk-17-jdk maven + else +apt-get update && apt-get install -y openjdk-8-jdk maven + fi apt remove openjdk-11* -y - - name: Build for Spark ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/ && \ - mvn clean install -P${{ matrix.spark }} -Pbackends-velox -DskipTests - - name: Build and run TPCH/DS ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/tools/gluten-it && \ - mvn clean install -P${{ matrix.spark }} \ - && GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \ ---local --preset=velox --benchmark-type=h --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \ - && GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \ ---local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 - - - run-tpc-test-centos7: -needs: build-native-lib -strategy: - fail-fast: false - matrix: -spark: ["spark-3.2", "spark-3.3", "spark-3.4", "spark-3.5"] -runs-on: ubuntu-20.04 -container: centos:7 -steps: - - uses: actions/checkout@v2 - - name: Download All Artifacts -uses: actions/download-artifact@v2 -with: - name: velox-native-lib-${{github.sha}} - path: ./cpp/build/releases - - name: Setup java and maven -run: | - yum update -y && yum install -y java-1.8.0-openjdk-devel wget - wget https://downloads.apache.org/maven/maven-3/3.8.8/binaries/apache-maven-3.8.8-bin.tar.gz - tar -xvf apache-maven-3.8.8-bin.tar.gz - mv apache-maven-3.8.8 /usr/lib/maven - - name: Build for Spark ${{ matrix.spark }} + - name: Build and run TPCH/DS run: | cd $GITHUB_WORKSPACE/ - export MAVEN_HOME=/usr/lib/maven - export PATH=${PATH}:${MAVEN_HOME}/bin - mvn clean install -P${{ matrix.spark }} -Pbackends-velox -DskipTests - - name: Build and run TPCH/DS ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/tools/gluten-it - export MAVEN_HOME=/usr/lib/maven - export PATH=${PATH}:${MAVEN_HOME}/bin - mvn clean install -P${{ matrix.spark }} \ + export JAVA_HOME=/usr/lib/jvm/${{ matrix.java }}-openjdk-amd64 + echo "JAVA_HOME: $JAVA_HOME" + mvn clean install -P${{ matrix.spark }} -P${{ matrix.java }} -Pbackends-velox -DskipTests + cd $GITHUB_WORKSPACE/tools/gluten-it + mvn clean install -P${{ matrix.spark }} -P${{ matrix.java }} \ && GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \ --local --preset=velox --benchmark-type=h --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \ && GLUTEN_IT_JVM_ARGS=-Xmx5G
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
github-actions[bot] commented on PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#issuecomment-2021843842 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
yaooqinn merged PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [DNM] Velox test [incubator-gluten]
github-actions[bot] commented on PR #4929: URL: https://github.com/apache/incubator-gluten/pull/4929#issuecomment-2021800771 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [VL] Daily Update Velox Version (2024_03_27) [incubator-gluten]
marin-ma opened a new pull request, #5138: URL: https://github.com/apache/incubator-gluten/pull/5138 ``` 7fc09667d (upstream/main) Add estimateSerializedSize to BatchVectorSerializer (#8712) c354c31f1 Reuse result vector in Alpha reader (#9226) 3fbb4754f Create UnitLoader (#9259) 6ec8f26d9 Remove logical types in min_by and max_by tests (#8999) 494b8881b Add support for kurtosis Spark aggregate function (#9233) 2d832eef4 Delete unused ExpressionFuzzer::generateXxxArgs methods (#9256) 0618c7f69 Fix integer overflow for window ROWS frame (#8870) 4f3d32fd5 Clean up FOLLY nullable annotation (#9247) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Daily Update Velox Version (2024_03_27) [incubator-gluten]
github-actions[bot] commented on PR #5138: URL: https://github.com/apache/incubator-gluten/pull/5138#issuecomment-2021777264 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
dcoliversun commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2021764604 @zhouyuan @wangyum please review this PR if have time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [Gluten-5018][CH] support minmax/bloomfilter/set skip index (#5019)
This is an automated email from the ASF dual-hosted git repository. mahongbin pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 972597184 [Gluten-5018][CH] support minmax/bloomfilter/set skip index (#5019) 972597184 is described below commit 972597184e147fcf488fd6cda4b447356d61136d Author: Hongbin Ma AuthorDate: Wed Mar 27 09:33:40 2024 +0800 [Gluten-5018][CH] support minmax/bloomfilter/set skip index (#5019) * temp, by defualt all cols minmax index basically works, dealing with nullable nullable/not-null ok remove unneceesary change fix compile * add ut * remove dataschema * fix spark32 bug --- .../source/DeltaMergeTreeFileFormat.scala | 17 +- .../source/DeltaMergeTreeFileFormat.scala | 17 +- .../java/io/glutenproject/metrics/MetricsStep.java | 11 + .../backendsapi/clickhouse/CHIteratorApi.scala | 3 + .../backendsapi/clickhouse/CHMetricsApi.scala | 1 + .../execution/GlutenMergeTreePartition.scala | 3 + .../metrics/FileSourceScanMetricsUpdater.scala | 2 + .../delta/ClickhouseOptimisticTransaction.scala| 7 +- .../sql/delta/catalog/ClickHouseTableV2.scala | 35 ++- .../utils/MergeTreePartsPartitionsUtil.scala | 33 +++ .../datasources/v1/CHMergeTreeWriterInjects.scala | 29 ++- .../v1/clickhouse/MergeTreeFileFormatWriter.scala | 9 + ...GlutenClickHouseTPCHNotNullSkipIndexSuite.scala | 271 ...lutenClickHouseTPCHNullableSkipIndexSuite.scala | 277 + .../apache/spark/affinity/MixedAffinitySuite.scala | 3 + cpp-ch/local-engine/Common/MergeTreeTool.cpp | 84 ++- cpp-ch/local-engine/Common/MergeTreeTool.h | 3 + cpp-ch/local-engine/Parser/MergeTreeRelParser.cpp | 18 +- cpp-ch/local-engine/Parser/RelMetric.cpp | 3 + cpp-ch/local-engine/Parser/TypeParser.cpp | 4 +- cpp-ch/local-engine/Parser/TypeParser.h| 50 ++-- .../substrait/rel/ExtensionTableBuilder.java | 6 + .../substrait/rel/ExtensionTableNode.java | 12 + .../datasource/GlutenFormatWriterInjects.scala | 4 +- 24 files changed, 843 insertions(+), 59 deletions(-) diff --git a/backends-clickhouse/src/main/delta-20/org/apache/spark/sql/execution/datasources/v2/clickhouse/source/DeltaMergeTreeFileFormat.scala b/backends-clickhouse/src/main/delta-20/org/apache/spark/sql/execution/datasources/v2/clickhouse/source/DeltaMergeTreeFileFormat.scala index fef109d35..d4ca321a9 100644 --- a/backends-clickhouse/src/main/delta-20/org/apache/spark/sql/execution/datasources/v2/clickhouse/source/DeltaMergeTreeFileFormat.scala +++ b/backends-clickhouse/src/main/delta-20/org/apache/spark/sql/execution/datasources/v2/clickhouse/source/DeltaMergeTreeFileFormat.scala @@ -17,7 +17,6 @@ package org.apache.spark.sql.execution.datasources.v2.clickhouse.source import org.apache.spark.sql.SparkSession -import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.delta.DeltaParquetFileFormat import org.apache.spark.sql.delta.actions.Metadata import org.apache.spark.sql.execution.datasources.{OutputWriter, OutputWriterFactory} @@ -31,9 +30,11 @@ class DeltaMergeTreeFileFormat(metadata: Metadata) protected var database = "" protected var tableName = "" - protected var dataSchemas = Seq.empty[Attribute] protected var orderByKeyOption: Option[Seq[String]] = None protected var lowCardKeyOption: Option[Seq[String]] = None + protected var minmaxIndexKeyOption: Option[Seq[String]] = None + protected var bfIndexKeyOption: Option[Seq[String]] = None + protected var setIndexKeyOption: Option[Seq[String]] = None protected var primaryKeyOption: Option[Seq[String]] = None protected var partitionColumns: Seq[String] = Seq.empty[String] protected var clickhouseTableConfigs: Map[String, String] = Map.empty @@ -42,18 +43,22 @@ class DeltaMergeTreeFileFormat(metadata: Metadata) metadata: Metadata, database: String, tableName: String, - schemas: Seq[Attribute], orderByKeyOption: Option[Seq[String]], lowCardKeyOption: Option[Seq[String]], + minmaxIndexKeyOption: Option[Seq[String]], + bfIndexKeyOption: Option[Seq[String]], + setIndexKeyOption: Option[Seq[String]], primaryKeyOption: Option[Seq[String]], clickhouseTableConfigs: Map[String, String], partitionColumns: Seq[String]) { this(metadata) this.database = database this.tableName = tableName -this.dataSchemas = schemas this.orderByKeyOption = orderByKeyOption this.lowCardKeyOption = lowCardKeyOption +this.minmaxIndexKeyOption = minmaxIndexKeyOption +this.bfIndexKeyOption = bfIndexKeyOption +this.setIndexKeyOption = setIndexKeyOption
Re: [I] [CH] basically support set/bloomfilter/minmax index for clickhouse tables [incubator-gluten]
binmahone closed issue #5018: [CH] basically support set/bloomfilter/minmax index for clickhouse tables URL: https://github.com/apache/incubator-gluten/issues/5018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CH] Issue 5018 [incubator-gluten]
binmahone merged PR #5019: URL: https://github.com/apache/incubator-gluten/pull/5019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [VL] Enable SPARK-10634 timestamp test case (#5090)
This is an automated email from the ASF dual-hosted git repository. rui pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new b962e7cc7 [VL] Enable SPARK-10634 timestamp test case (#5090) b962e7cc7 is described below commit b962e7cc74f7a7114770e9a882f10d5eaa59a355 Author: Joey AuthorDate: Wed Mar 27 09:32:41 2024 +0800 [VL] Enable SPARK-10634 timestamp test case (#5090) --- .../src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala | 2 -- .../src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala | 2 -- .../src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala | 2 -- 3 files changed, 6 deletions(-) diff --git a/gluten-ut/spark32/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala b/gluten-ut/spark32/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala index 5f66df1a0..2d92c5ca2 100644 --- a/gluten-ut/spark32/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala +++ b/gluten-ut/spark32/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala @@ -857,7 +857,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS") // Rewrite because the filter after datasource is not needed. @@ -869,7 +868,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS") // Rewrite because the filter after datasource is not needed. diff --git a/gluten-ut/spark33/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala b/gluten-ut/spark33/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala index f2e75f84f..dd14a604b 100644 --- a/gluten-ut/spark33/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala +++ b/gluten-ut/spark33/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala @@ -682,7 +682,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS") .exclude("SPARK-36182: read TimestampNTZ as TimestampLTZ") @@ -698,7 +697,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS") .exclude("SPARK-36182: read TimestampNTZ as TimestampLTZ") diff --git a/gluten-ut/spark34/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala b/gluten-ut/spark34/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala index 1c37e787b..d2555007b 100644 --- a/gluten-ut/spark34/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala +++ b/gluten-ut/spark34/src/test/scala/io/glutenproject/utils/velox/VeloxTestSettings.scala @@ -668,7 +668,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS") .exclude("SPARK-36182: read TimestampNTZ as TimestampLTZ") @@ -684,7 +683,6 @@ class VeloxTestSettings extends BackendTestSettings { // decimal failed ut .exclude("SPARK-34212 Parquet should read decimals correctly") // Timestamp is read as INT96. -.exclude("SPARK-10634 timestamp written and read as INT64 - truncation") .exclude("Migration from INT96 to TIMESTAMP_MICROS timestamp type") .exclude("SPARK-10365 timestamp written and read as INT64 - TIMESTAMP_MICROS")
Re: [PR] [VL] Enable SPARK-10634 timestamp test case [incubator-gluten]
rui-mo merged PR #5090: URL: https://github.com/apache/incubator-gluten/pull/5090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Support YearMonthIntervalType and enable make_ym_interval [incubator-gluten]
marin-ma commented on PR #4798: URL: https://github.com/apache/incubator-gluten/pull/4798#issuecomment-2021758296 @zzcclp CH CI passed. Could you help to review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [DNM] Velox test [incubator-gluten]
github-actions[bot] commented on PR #4929: URL: https://github.com/apache/incubator-gluten/pull/4929#issuecomment-2021752956 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file (#5129)
This is an automated email from the ASF dual-hosted git repository. ulyssesyou pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 6dc7885f6 [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file (#5129) 6dc7885f6 is described below commit 6dc7885f6c54f4ea0f773e920fb455e09298b3b7 Author: Zhen Wang <643348...@qq.com> AuthorDate: Wed Mar 27 09:15:13 2024 +0800 [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file (#5129) --- .../io/glutenproject/backendsapi/clickhouse/CHBackend.scala| 6 +++--- .../io/glutenproject/backendsapi/velox/VeloxBackend.scala | 6 +++--- gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala | 6 -- .../src/main/scala/io/glutenproject/backendsapi/Backend.scala | 10 +++--- .../io/glutenproject/backendsapi/BackendsApiManager.scala | 4 +--- 5 files changed, 14 insertions(+), 18 deletions(-) diff --git a/backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHBackend.scala b/backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHBackend.scala index fbcb804a3..a7c5c9980 100644 --- a/backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHBackend.scala +++ b/backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHBackend.scala @@ -16,7 +16,7 @@ */ package io.glutenproject.backendsapi.clickhouse -import io.glutenproject.{CH_BRANCH, CH_COMMIT, GlutenConfig, GlutenPlugin} +import io.glutenproject.{CH_BRANCH, CH_COMMIT, GlutenConfig} import io.glutenproject.backendsapi._ import io.glutenproject.expression.WindowFunctionsBuilder import io.glutenproject.extension.ValidationResult @@ -41,8 +41,8 @@ import scala.util.control.Breaks.{break, breakable} class CHBackend extends Backend { override def name(): String = CHBackend.BACKEND_NAME - override def buildInfo(): GlutenPlugin.BackendBuildInfo = -GlutenPlugin.BackendBuildInfo("ClickHouse", CH_BRANCH, CH_COMMIT, "UNKNOWN") + override def buildInfo(): BackendBuildInfo = +BackendBuildInfo("ClickHouse", CH_BRANCH, CH_COMMIT, "UNKNOWN") override def iteratorApi(): IteratorApi = new CHIteratorApi override def sparkPlanExecApi(): SparkPlanExecApi = new CHSparkPlanExecApi override def transformerApi(): TransformerApi = new CHTransformerApi diff --git a/backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/VeloxBackend.scala b/backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/VeloxBackend.scala index 0ff2bd0d7..3293abe3e 100644 --- a/backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/VeloxBackend.scala +++ b/backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/VeloxBackend.scala @@ -16,7 +16,7 @@ */ package io.glutenproject.backendsapi.velox -import io.glutenproject.{GlutenConfig, GlutenPlugin, VELOX_BRANCH, VELOX_REVISION, VELOX_REVISION_TIME} +import io.glutenproject.{GlutenConfig, VELOX_BRANCH, VELOX_REVISION, VELOX_REVISION_TIME} import io.glutenproject.backendsapi._ import io.glutenproject.exception.GlutenNotSupportException import io.glutenproject.execution.WriteFilesExecTransformer @@ -44,8 +44,8 @@ import scala.util.control.Breaks.breakable class VeloxBackend extends Backend { override def name(): String = VeloxBackend.BACKEND_NAME - override def buildInfo(): GlutenPlugin.BackendBuildInfo = -GlutenPlugin.BackendBuildInfo("Velox", VELOX_BRANCH, VELOX_REVISION, VELOX_REVISION_TIME) + override def buildInfo(): BackendBuildInfo = +BackendBuildInfo("Velox", VELOX_BRANCH, VELOX_REVISION, VELOX_REVISION_TIME) override def iteratorApi(): IteratorApi = new IteratorApiImpl override def sparkPlanExecApi(): SparkPlanExecApi = new SparkPlanExecApiImpl override def transformerApi(): TransformerApi = new TransformerApiImpl diff --git a/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala b/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala index c54b78da9..5fa3083c2 100644 --- a/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala +++ b/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala @@ -278,10 +278,4 @@ private[glutenproject] object GlutenPlugin { implicit def sparkConfImplicit(conf: SparkConf): SparkConfImplicits = { new SparkConfImplicits(conf) } - - case class BackendBuildInfo( - backend: String, - backendBranch: String, - backendRevision: String, - backendRevisionTime: String) } diff --git a/gluten-core/src/main/scala/io/glutenproject/backendsapi/Backend.scala b/gluten-core/src/main/scala/io/glutenproject/backendsapi/Backend.scala index 438194a36..09799cdb1 100644 --- a/gluten-core/src/main/scala/io/glutenproject/backendsapi/Backend.scala +++ b/gluten-core/src/main/scala/io/glutenproject/backendsapi/Backend.scala @@ -16,12 +16,10 @@ */
(incubator-gluten) branch main updated: [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectExecTransformer (#5134)
This is an automated email from the ASF dual-hosted git repository. ulyssesyou pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 5b8b96e25 [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectExecTransformer (#5134) 5b8b96e25 is described below commit 5b8b96e2541525544ba1e80c957a2bd8c5c1e95b Author: guixiaowen <58287738+guixiao...@users.noreply.github.com> AuthorDate: Wed Mar 27 09:14:48 2024 +0800 [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectExecTransformer (#5134) --- .../glutenproject/execution/TakeOrderedAndProjectExecTransformer.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gluten-core/src/main/scala/io/glutenproject/execution/TakeOrderedAndProjectExecTransformer.scala b/gluten-core/src/main/scala/io/glutenproject/execution/TakeOrderedAndProjectExecTransformer.scala index 0f0137b5d..f7b1fe2f4 100644 --- a/gluten-core/src/main/scala/io/glutenproject/execution/TakeOrderedAndProjectExecTransformer.scala +++ b/gluten-core/src/main/scala/io/glutenproject/execution/TakeOrderedAndProjectExecTransformer.scala @@ -49,7 +49,7 @@ case class TakeOrderedAndProjectExecTransformer( val orderByString = truncatedString(sortOrder, "[", ",", "]", maxFields) val outputString = truncatedString(output, "[", ",", "]", maxFields) -s"TakeOrderedAndProjectExecTransform(limit=$limit, " + +s"TakeOrderedAndProjectExecTransformer (limit=$limit, " + s"orderBy=$orderByString, output=$outputString)" } - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectExecTransformer [incubator-gluten]
ulysses-you closed issue #5133: [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectExecTransformer URL: https://github.com/apache/incubator-gluten/issues/5133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectE… [incubator-gluten]
ulysses-you merged PR #5134: URL: https://github.com/apache/incubator-gluten/pull/5134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file [incubator-gluten]
ulysses-you merged PR #5129: URL: https://github.com/apache/incubator-gluten/pull/5129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL][DNM]Test Q95 post probe spill [incubator-gluten]
github-actions[bot] commented on PR #5063: URL: https://github.com/apache/incubator-gluten/pull/5063#issuecomment-2021746292 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5083][CH] Invalid result with `mergeTwoPhasesHashBaseAggregateIfNeed` enable [incubator-gluten]
github-actions[bot] commented on PR #5137: URL: https://github.com/apache/incubator-gluten/pull/5137#issuecomment-2021735086 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5083][CH] Invalid result with `mergeTwoPhasesHashBaseAggregateIfNeed` enable [incubator-gluten]
github-actions[bot] commented on PR #5137: URL: https://github.com/apache/incubator-gluten/pull/5137#issuecomment-2021734843 https://github.com/apache/incubator-gluten/issues/5083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
ulysses-you commented on code in PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#discussion_r1540297481 ## docs/get-started/Velox.md: ## @@ -5,28 +5,34 @@ nav_order: 1 parent: Getting-Started --- # Supported Version -| Type | Version | -|---|--| -| Spark | 3.2.2, 3.3.1 | -| OS| Ubuntu20.04/22.04, Centos7/8 | -| jdk | openjdk8 | -| scala | 2.12 -Spark3.4.0 support is still WIP. TPCH/DS can pass, UT is not yet passed. +| Type | Version | +|---|-| +| Spark | 3.2.2, 3.3.1, 3.4.2, 3.5.1(wip) | +| OS| Ubuntu20.04/22.04, Centos7/8| +| jdk | openjdk8/jdk17 | +| scala | 2.12| -There are pending PRs for jdk11 support. +**JDK17** Review Comment: I moved it to `NewToGluten.md` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
github-actions[bot] commented on PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#issuecomment-2021727351 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[I] [VL] Vanilla Spark broadcast exchange + C2R is slow sometimes [incubator-gluten]
zhztheplayer opened a new issue, #5136: URL: https://github.com/apache/incubator-gluten/issues/5136 ### Backend VL (Velox) ### Bug description This is because the code to convert vanilla Spark's hashed relation to Gluten's sometimes produced duplicated rows. The fix will be incorporated in https://github.com/apache/incubator-gluten/pull/5058 since it can be tested by the ACBO changes. ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Basic runnable version of ACBO (Advanced CBO) [incubator-gluten]
github-actions[bot] commented on PR #5058: URL: https://github.com/apache/incubator-gluten/pull/5058#issuecomment-2021706928 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [VL] Velox patch to avoid installing libunwind-dev no longer works (#5127)
This is an automated email from the ASF dual-hosted git repository. hongze pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new 2aa60d0ea [VL] Velox patch to avoid installing libunwind-dev no longer works (#5127) 2aa60d0ea is described below commit 2aa60d0eae8fdd0f4020842c5233ca8a3197bd5e Author: Hongze Zhang AuthorDate: Wed Mar 27 08:26:33 2024 +0800 [VL] Velox patch to avoid installing libunwind-dev no longer works (#5127) --- ep/build-velox/src/get_velox.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ep/build-velox/src/get_velox.sh b/ep/build-velox/src/get_velox.sh index 767585e91..26e7a9cd0 100755 --- a/ep/build-velox/src/get_velox.sh +++ b/ep/build-velox/src/get_velox.sh @@ -86,7 +86,7 @@ function process_setup_ubuntu { # need set BUILD_SHARED_LIBS flag for thrift sed -i "/facebook\/fbthrift/{n;s/cmake_install -DBUILD_TESTS=OFF/cmake_install -DBUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF/;}" scripts/setup-ubuntu.sh # Do not install libunwind which can cause interruption when catching native exception. - sed -i 's/sudo --preserve-env apt install -y libunwind-dev && //' scripts/setup-ubuntu.sh + sed -i 's/${SUDO} apt install -y libunwind-dev//' scripts/setup-ubuntu.sh sed -i '/ccache/a\ *thrift* \\' scripts/setup-ubuntu.sh sed -i '/ccache/a\ libiberty-dev \\' scripts/setup-ubuntu.sh sed -i '/ccache/a\ libxml2-dev \\' scripts/setup-ubuntu.sh - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Velox patch to avoid installing libunwind-dev no longer works [incubator-gluten]
zhztheplayer merged PR #5127: URL: https://github.com/apache/incubator-gluten/pull/5127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Velox patch to avoid installing libunwind-dev no longer works [incubator-gluten]
zhztheplayer commented on PR #5127: URL: https://github.com/apache/incubator-gluten/pull/5127#issuecomment-2021704715 cc @PHILO-HE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL][MINOR] Refactor operator/function tests [incubator-gluten]
GlutenPerfBot commented on PR #5037: URL: https://github.com/apache/incubator-gluten/pull/5037#issuecomment-2021667403 = Performance report for TPCH SF2000 with Velox backend, for reference only query log/native_master_03_26_2024_time.csv log/native_master_03_25_2024_2ce826995_time.csv difference percentage q1 38.64 34.62 -4.022 89.59% q2 24.52 23.57 -0.953 96.11% q3 38.16 36.77 -1.391 96.35% q4 39.39 38.42 -0.974 97.53% q5 69.01 67.14 -1.866 97.30% q6 5.96 7.58 1.616 127.11% q7 82.13 85.17 3.034 103.69% q8 84.26 86.22 1.961 102.33% q9 119.85 121.73 1.884 101.57% q10 43.07 44.57 1.492 103.46% q11 20.06 20.62 0.562 102.80% q12 26.89 27.49 0.598 102.22% q13 47.06 46.45 -0.604 98.72% q14 21.39 17.85 -3.538 83.46% q15 30.98 29.13 -1.854 94.02% q16 13.81 15.29 1.474 110.67% q17 101.16 98.96 -2.201 97.82% q18 141.47 141.61 0.146 100.10% q19 13.65 13.63 -0.017 99.87%
Re: [PR] [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240327) [incubator-gluten]
github-actions[bot] commented on PR #5135: URL: https://github.com/apache/incubator-gluten/pull/5135#issuecomment-2021611455 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240327) [incubator-gluten]
lwz9103 opened a new pull request, #5135: URL: https://github.com/apache/incubator-gluten/pull/5135 Auto commit by gluten daily build, please check the build status and merge it if it's green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240327) [incubator-gluten]
github-actions[bot] commented on PR #5135: URL: https://github.com/apache/incubator-gluten/pull/5135#issuecomment-2021611241 https://github.com/apache/incubator-gluten/issues/1632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] Crash when writing an array of struct [incubator-gluten]
clee704 commented on issue #4964: URL: https://github.com/apache/incubator-gluten/issues/4964#issuecomment-2021599373 @JkSelf Actually it crashes on Spark 3.4 too. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fa4a369e2c5, pid=915090, tid=915112 # # JRE version: OpenJDK Runtime Environment (11.0.22+7) (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1) # Java VM: OpenJDK 64-Bit Server VM (11.0.22+7-post-Ubuntu-0ubuntu222.04.1, mixed mode, tiered, g1 gc, linux-amd64) # Problematic frame: # C [libvelox.so+0x229e2c5] (anonymous namespace)::makeRowVector(std::vector, std::allocator > > const&)+0xa5 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /ssd/chungmin/repos/spark/core.915090) # # An error report file with more information is saved as: # /ssd/chungmin/repos/spark/hs_err_pid915090.log # # If you would like to submit a bug report, please visit: # https://bugs.launchpad.net/ubuntu/+source/openjdk-lts # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # Spark 3.4.2 Gluten 58a459bf487120208a774d7959f7c7db417f490b -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectE… [incubator-gluten]
github-actions[bot] commented on PR #5134: URL: https://github.com/apache/incubator-gluten/pull/5134#issuecomment-2020923742 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectE… [incubator-gluten]
github-actions[bot] commented on PR #5134: URL: https://github.com/apache/incubator-gluten/pull/5134#issuecomment-2020922735 https://github.com/apache/incubator-gluten/issues/5133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectE… [incubator-gluten]
guixiaowen opened a new pull request, #5134: URL: https://github.com/apache/incubator-gluten/pull/5134 …xecTransformer #5133 ## What changes were proposed in this pull request? In TakeOrderedAndProjectExecTransformer, the prompt information is different from others. For example: spark-sql>explain select a from test.tablea order by a limit 5 plan == Physical Plan == VeloxColumnarToRowExec +- TakeOrderedAndProjectExecTransform(limit=5, orderBy=[a#13 ASC NULLS FIRST], output=[a#13]) +- ^(3) NativeScan hive test.tablea [a#13], HiveTableRelation [test.tablea, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#13], Partition Cols: []] "TakeOrderedAndProjectExecTransform" is changed to ""TakeOrderedAndProjectExecTransformer, which will be consistent with other enhanced information styles. (Fixes: \#5133) After this pr: spark-sql>explain select a from test.tablea order by a limit 5 plan == Physical Plan == VeloxColumnarToRowExec +- TakeOrderedAndProjectExecTransformer (limit=5, orderBy=[a#13 ASC NULLS FIRST], output=[a#13]) +- ^(3) NativeScan hive test.tablea [a#13], HiveTableRelation [test.tablea, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#13], Partition Cols: []] ## How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[I] Modify the prompt information for TakeOrderedAndProjectExecTransformer [incubator-gluten]
guixiaowen opened a new issue, #5133: URL: https://github.com/apache/incubator-gluten/issues/5133 ### Description In TakeOrderedAndProjectExecTransformer, the prompt information is different from others. For example: spark-sql>explain select a from test.tablea order by a limit 5 plan == Physical Plan == VeloxColumnarToRowExec +- TakeOrderedAndProjectExecTransform(limit=5, orderBy=[a#13 ASC NULLS FIRST], output=[a#13]) +- ^(3) NativeScan hive test.tablea [a#13], HiveTableRelation [`test`.`tablea`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#13], Partition Cols: []] "TakeOrderedAndProjectExecTransform" is changed to ""TakeOrderedAndProjectExecTransformer, which will be consistent with other enhanced information styles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[I] Modify the prompt information for TakeOrderedAndProjectExecTransformer's simpleString [incubator-gluten]
guixiaowen opened a new issue, #5132: URL: https://github.com/apache/incubator-gluten/issues/5132 ### Description In TakeOrderedAndProjectExecTransformer, the prompt information is different from others. For example: spark-sql>explain select a from test.tablea order by a limit 5 plan == Physical Plan == VeloxColumnarToRowExec +- TakeOrderedAndProjectExecTransform(limit=5, orderBy=[a#13 ASC NULLS FIRST], output=[a#13]) +- ^(3) NativeScan hive test.tablea [a#13], HiveTableRelation [`test`.`tablea`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#13], Partition Cols: []] "TakeOrderedAndProjectExecTransform" is changed to ""TakeOrderedAndProjectExecTransformer, which will be consistent with other enhanced information styles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Support YearMonthIntervalType and enable make_ym_interval [incubator-gluten]
github-actions[bot] commented on PR #4798: URL: https://github.com/apache/incubator-gluten/pull/4798#issuecomment-2020860684 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [VL] Unsupported spark function list [please leave a comment if you plan to pick some] [incubator-gluten]
supermem613 commented on issue #4039: URL: https://github.com/apache/incubator-gluten/issues/4039#issuecomment-2020700016 I'd like to pick up base64 and unbase64, please. (FYI, looks like there was a PR above for unbase64, but it seems to have been closed without committing ~45-55 days ago, so hopefully I am not conflicting with any work). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
github-actions[bot] commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2020688726 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable second Spark function [incubator-gluten]
github-actions[bot] commented on PR #5131: URL: https://github.com/apache/incubator-gluten/pull/5131#issuecomment-2020597809 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [I] [VL] spark.read.csv("/tmp/test.csv") throws Exception [incubator-gluten]
xumingming commented on issue #5044: URL: https://github.com/apache/incubator-gluten/issues/5044#issuecomment-2020579428 @PHILO-HE Thanks for the information! I tried with parquet data(nation table in TPCH), the details are the following: ``` == Fallback Summary == (4) Project: Not supported to map spark function name to substrait function name: toprettystring(n_nationkey#23, Some(Asia/Shanghai)), class name: ToPrettyString. (5) CollectLimit: Gluten does not touch it or does not support it == Physical Plan == CollectLimit (5) +- Project (4) +- VeloxColumnarToRowExec (3) +- ^ Scan parquet (1) ``` Is the fallback for `Project` expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL][CI] Enable Celeborn tests & Gluten CPP tests [incubator-gluten]
github-actions[bot] commented on PR #5114: URL: https://github.com/apache/incubator-gluten/pull/5114#issuecomment-2020420744 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
github-actions[bot] commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2020416623 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
PHILO-HE commented on code in PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#discussion_r1539216963 ## .github/workflows/velox_docker.yml: ## @@ -73,6 +73,17 @@ jobs: matrix: os: ["ubuntu:20.04", "ubuntu:22.04"] spark: ["spark-3.2", "spark-3.3", "spark-3.4", "spark-3.5"] +java: [ "java-8", "java-17" ] +# Spark supports JDK17 since 3.3 and later, see https://issues.apache.org/jira/browse/SPARK-33772 +exclude: Review Comment: Looks `include` cannot make it concise. Please ignore this comment. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable second Spark function [incubator-gluten]
github-actions[bot] commented on PR #5131: URL: https://github.com/apache/incubator-gluten/pull/5131#issuecomment-2020391322 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable second Spark function [incubator-gluten]
github-actions[bot] commented on PR #5131: URL: https://github.com/apache/incubator-gluten/pull/5131#issuecomment-2020388375 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Enable second Spark function [incubator-gluten]
github-actions[bot] commented on PR #5131: URL: https://github.com/apache/incubator-gluten/pull/5131#issuecomment-2020371865 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
PHILO-HE commented on code in PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#discussion_r1539159436 ## .github/workflows/velox_docker.yml: ## @@ -84,69 +95,49 @@ jobs: path: ./cpp/build/releases - name: Setup java and maven run: | - apt-get update && \ - apt-get install -y openjdk-8-jdk maven && \ + if [ "${{ matrix.java }}" = "java-17" ]; then +apt-get update && apt-get install -y openjdk-17-jdk maven + else +apt-get update && apt-get install -y openjdk-8-jdk maven + fi apt remove openjdk-11* -y - - name: Build for Spark ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/ && \ - mvn clean install -P${{ matrix.spark }} -Pbackends-velox -DskipTests - - name: Build and run TPCH/DS ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/tools/gluten-it && \ - mvn clean install -P${{ matrix.spark }} \ - && GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \ ---local --preset=velox --benchmark-type=h --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \ - && GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \ ---local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 - - - run-tpc-test-centos7: -needs: build-native-lib -strategy: - fail-fast: false - matrix: -spark: ["spark-3.2", "spark-3.3", "spark-3.4", "spark-3.5"] -runs-on: ubuntu-20.04 -container: centos:7 -steps: - - uses: actions/checkout@v2 - - name: Download All Artifacts -uses: actions/download-artifact@v2 -with: - name: velox-native-lib-${{github.sha}} - path: ./cpp/build/releases - - name: Setup java and maven -run: | - yum update -y && yum install -y java-1.8.0-openjdk-devel wget - wget https://downloads.apache.org/maven/maven-3/3.8.8/binaries/apache-maven-3.8.8-bin.tar.gz - tar -xvf apache-maven-3.8.8-bin.tar.gz - mv apache-maven-3.8.8 /usr/lib/maven - - name: Build for Spark ${{ matrix.spark }} + - name: Build and run TPCH/DS run: | cd $GITHUB_WORKSPACE/ - export MAVEN_HOME=/usr/lib/maven - export PATH=${PATH}:${MAVEN_HOME}/bin - mvn clean install -P${{ matrix.spark }} -Pbackends-velox -DskipTests - - name: Build and run TPCH/DS ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/tools/gluten-it - export MAVEN_HOME=/usr/lib/maven - export PATH=${PATH}:${MAVEN_HOME}/bin - mvn clean install -P${{ matrix.spark }} \ + if [ "${{ matrix.java }}" = "java-17" ]; then +export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 Review Comment: Nit: export JAVA_HOME=/usr/lib/jvm/${{ matrix.java }}-openjdk-amd64 ## .github/workflows/velox_docker.yml: ## @@ -73,6 +73,17 @@ jobs: matrix: os: ["ubuntu:20.04", "ubuntu:22.04"] spark: ["spark-3.2", "spark-3.3", "spark-3.4", "spark-3.5"] +java: [ "java-8", "java-17" ] +# Spark supports JDK17 since 3.3 and later, see https://issues.apache.org/jira/browse/SPARK-33772 +exclude: Review Comment: Nit: maybe, better to use `include` for simplicity. ## docs/get-started/Velox.md: ## @@ -5,28 +5,34 @@ nav_order: 1 parent: Getting-Started --- # Supported Version -| Type | Version | -|---|--| -| Spark | 3.2.2, 3.3.1 | -| OS| Ubuntu20.04/22.04, Centos7/8 | -| jdk | openjdk8 | -| scala | 2.12 -Spark3.4.0 support is still WIP. TPCH/DS can pass, UT is not yet passed. +| Type | Version | +|---|-| +| Spark | 3.2.2, 3.3.1, 3.4.2, 3.5.1(wip) | +| OS| Ubuntu20.04/22.04, Centos7/8| +| jdk | openjdk8/jdk17 | +| scala | 2.12| -There are pending PRs for jdk11 support. +**JDK17** Review Comment: Maybe, better to document this part in a common place, as it is not specific to Velox backend. ## .github/workflows/velox_docker.yml: ## @@ -84,69 +95,49 @@ jobs: path: ./cpp/build/releases - name: Setup java and maven run: | - apt-get update && \ - apt-get install -y openjdk-8-jdk maven && \ + if [ "${{ matrix.java }}" = "java-17" ]; then +apt-get update && apt-get install -y openjdk-17-jdk maven + else +apt-get update && apt-get install -y openjdk-8-jdk maven + fi apt remove openjdk-11* -y - - name: Build for Spark ${{ matrix.spark }} -run: | - cd $GITHUB_WORKSPACE/ && \ - mvn clean
Re: [PR] [CORE] Enable second Spark function [incubator-gluten]
github-actions[bot] commented on PR #5131: URL: https://github.com/apache/incubator-gluten/pull/5131#issuecomment-2020372832 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Enable SPARK-10634 timestamp test case [incubator-gluten]
github-actions[bot] commented on PR #5090: URL: https://github.com/apache/incubator-gluten/pull/5090#issuecomment-2020340206 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Support YearMonthIntervalType and enable make_ym_interval [incubator-gluten]
zzcclp commented on PR #4798: URL: https://github.com/apache/incubator-gluten/pull/4798#issuecomment-2020336897 It seems there are some `SPARK-36830: Support reading and writing ANSI intervals` which are not disable for the spark 3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL] Enable SPARK-10634 timestamp test case [incubator-gluten]
rui-mo commented on code in PR #5090: URL: https://github.com/apache/incubator-gluten/pull/5090#discussion_r1539108319 ## ep/build-velox/src/get_velox.sh: ## @@ -16,8 +16,8 @@ set -exu -VELOX_REPO=https://github.com/oap-project/velox.git -VELOX_BRANCH=2024_03_25 +VELOX_REPO=https://github.com/liujiayi771/velox.git +VELOX_BRANCH=2024_03_25_ts_fix Review Comment: Could you revert this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL][MINOR] Refactor operator/function tests [incubator-gluten]
PHILO-HE merged PR #5037: URL: https://github.com/apache/incubator-gluten/pull/5037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [VL][MINOR] Refactor operator/function validation tests (#5037)
This is an automated email from the ASF dual-hosted git repository. philo pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new adf0566a2 [VL][MINOR] Refactor operator/function validation tests (#5037) adf0566a2 is described below commit adf0566a2056276612694bc980f4e6e9028eb7d1 Author: PHILO-HE AuthorDate: Tue Mar 26 20:12:28 2024 +0800 [VL][MINOR] Refactor operator/function validation tests (#5037) --- ...lutenClickHouseWholeStageTransformerSuite.scala | 1 - .../benchmarks/NativeBenchmarkPlanGenerator.scala | 1 - .../benchmarks/ShuffleWriterFuzzerTest.scala | 1 - .../io/glutenproject/execution/FallbackSuite.scala | 1 - ...sionSuite.scala => FunctionsValidateTest.scala} | 71 - ...te.scala => ScalarFunctionsValidateSuite.scala} | 162 +++-- .../io/glutenproject/execution/TestOperator.scala | 135 ++--- .../execution/VeloxAggregateFunctionsSuite.scala | 1 - .../execution/VeloxColumnarCacheSuite.scala| 1 - .../execution/VeloxHashJoinSuite.scala | 1 - .../execution/VeloxLiteralSuite.scala | 1 - .../execution/VeloxMetricsSuite.scala | 1 - .../VeloxOrcDataTypeValidationSuite.scala | 1 - .../VeloxParquetDataTypeValidationSuite.scala | 1 - .../glutenproject/execution/VeloxScanSuite.scala | 1 - .../execution/VeloxStringFunctionsSuite.scala | 1 - .../glutenproject/execution/VeloxTPCDSSuite.scala | 1 - .../glutenproject/execution/VeloxTPCHSuite.scala | 1 - .../execution/VeloxWindowExpressionSuite.scala | 1 - .../execution/WindowFunctionsValidateSuite.scala | 35 + .../sql/execution/VeloxParquetWriteSuite.scala | 1 - .../execution/WholeStageTransformerSuite.scala | 1 - .../glutenproject/execution/VeloxDeltaSuite.scala | 1 - .../execution/VeloxIcebergSuite.scala | 1 - 24 files changed, 175 insertions(+), 248 deletions(-) diff --git a/backends-clickhouse/src/test/scala/io/glutenproject/execution/GlutenClickHouseWholeStageTransformerSuite.scala b/backends-clickhouse/src/test/scala/io/glutenproject/execution/GlutenClickHouseWholeStageTransformerSuite.scala index e40f3d0e7..a2de7cf51 100644 --- a/backends-clickhouse/src/test/scala/io/glutenproject/execution/GlutenClickHouseWholeStageTransformerSuite.scala +++ b/backends-clickhouse/src/test/scala/io/glutenproject/execution/GlutenClickHouseWholeStageTransformerSuite.scala @@ -70,7 +70,6 @@ class GlutenClickHouseWholeStageTransformerSuite extends WholeStageTransformerSu protected val metaStorePathAbsolute = basePath + "/meta" protected val hiveMetaStoreDB = metaStorePathAbsolute + "/metastore_db" - override protected val backend: String = "ch" final override protected val resourcePath: String = "" // ch not need this override protected val fileFormat: String = "parquet" } diff --git a/backends-velox/src/test/scala/io/glutenproject/benchmarks/NativeBenchmarkPlanGenerator.scala b/backends-velox/src/test/scala/io/glutenproject/benchmarks/NativeBenchmarkPlanGenerator.scala index c9863111a..dafe3af3e 100644 --- a/backends-velox/src/test/scala/io/glutenproject/benchmarks/NativeBenchmarkPlanGenerator.scala +++ b/backends-velox/src/test/scala/io/glutenproject/benchmarks/NativeBenchmarkPlanGenerator.scala @@ -35,7 +35,6 @@ import scala.collection.JavaConverters._ object GenerateExample extends Tag("io.glutenproject.tags.GenerateExample") class NativeBenchmarkPlanGenerator extends VeloxWholeStageTransformerSuite { - override protected val backend: String = "velox" override protected val resourcePath: String = "/tpch-data-parquet-velox" override protected val fileFormat: String = "parquet" val generatedPlanDir = getClass.getResource("/").getPath + "../../../generated-native-benchmark/" diff --git a/backends-velox/src/test/scala/io/glutenproject/benchmarks/ShuffleWriterFuzzerTest.scala b/backends-velox/src/test/scala/io/glutenproject/benchmarks/ShuffleWriterFuzzerTest.scala index 9d723f04f..7f863de68 100644 --- a/backends-velox/src/test/scala/io/glutenproject/benchmarks/ShuffleWriterFuzzerTest.scala +++ b/backends-velox/src/test/scala/io/glutenproject/benchmarks/ShuffleWriterFuzzerTest.scala @@ -37,7 +37,6 @@ object ShuffleWriterFuzzerTest { @FuzzerTest @SkipTestTags class ShuffleWriterFuzzerTest extends VeloxWholeStageTransformerSuite { - override protected val backend: String = "velox" override protected val resourcePath: String = "/tpch-data-parquet-velox" override protected val fileFormat: String = "parquet" diff --git a/backends-velox/src/test/scala/io/glutenproject/execution/FallbackSuite.scala b/backends-velox/src/test/scala/io/glutenproject/execution/FallbackSuite.scala index d9b1b4604..69e5b614c 100644 ---
Re: [PR] [CORE] Support JDK17 [incubator-gluten]
ulysses-you commented on PR #5120: URL: https://github.com/apache/incubator-gluten/pull/5120#issuecomment-2020149985 cc @zhztheplayer @zhouyuan @PHILO-HE if you have other comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml [incubator-gluten]
github-actions[bot] commented on PR #5124: URL: https://github.com/apache/incubator-gluten/pull/5124#issuecomment-2020149678 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [MINOR] Remove redundant string format [incubator-gluten]
ulysses-you merged PR #5126: URL: https://github.com/apache/incubator-gluten/pull/5126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
(incubator-gluten) branch main updated: [MINOR] Remove redundant string format (#5126)
This is an automated email from the ASF dual-hosted git repository. ulyssesyou pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git The following commit(s) were added to refs/heads/main by this push: new dba4bcd3c [MINOR] Remove redundant string format (#5126) dba4bcd3c is described below commit dba4bcd3c4587f91296cd2387dc089c8c7f4b970 Author: Zhen Wang <643348...@qq.com> AuthorDate: Tue Mar 26 19:02:41 2024 +0800 [MINOR] Remove redundant string format (#5126) --- gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala b/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala index 670c9411d..c54b78da9 100644 --- a/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala +++ b/gluten-core/src/main/scala/io/glutenproject/GlutenPlugin.scala @@ -141,7 +141,7 @@ private[glutenproject] class GlutenDriverPlugin extends DriverPlugin with Loggin } else { s"$GLUTEN_SESSION_EXTENSION_NAME" } -conf.set(SPARK_SESSION_EXTS_KEY, String.format("%s", extensions)) +conf.set(SPARK_SESSION_EXTS_KEY, extensions) // off-heap bytes if (!conf.contains(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)) { - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [VL](WIP) Support native UDAF [incubator-gluten]
github-actions[bot] commented on PR #5130: URL: https://github.com/apache/incubator-gluten/pull/5130#issuecomment-2020090416 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [VL](WIP) Support native UDAF [incubator-gluten]
marin-ma opened a new pull request, #5130: URL: https://github.com/apache/incubator-gluten/pull/5130 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file [incubator-gluten]
github-actions[bot] commented on PR #5129: URL: https://github.com/apache/incubator-gluten/pull/5129#issuecomment-2020078075 Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename ***commit message*** and ***pull request title*** in the following format? [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message} See also: * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
Re: [PR] [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file [incubator-gluten]
github-actions[bot] commented on PR #5129: URL: https://github.com/apache/incubator-gluten/pull/5129#issuecomment-2020078525 Run Gluten Clickhouse CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org
[PR] [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file [incubator-gluten]
wForget opened a new pull request, #5129: URL: https://github.com/apache/incubator-gluten/pull/5129 ## What changes were proposed in this pull request? The `BackendBuildInfo` case class seemed more appropriate in `Backend` class file, so I moved it. ## How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org