Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126290921 my sql is like: ``` select LO_SUPPKEY from SSB_1G.LINEORDER GROUP BY LO_SUPPKEY limit 20 offset 10 ``` The stand-alone physical

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2126285552 Thank you @viirya Please take a final look https://github.com/apache/datafusion-comet/pull/407/commits/32bc3148d542478e867372876b47b9e85eac1207 -- This is an automated

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611036640 ## spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611036092 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611033707 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-22 Thread via GitHub
Ted-Jiang commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1611031689 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,94 @@ use crate::physical_optimizer::pruning::{PruningPredicate,

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on issue #389: URL: https://github.com/apache/datafusion-comet/issues/389#issuecomment-2126265633 Thanks. Looks like this is the history https://github.com/apache/datafusion-comet/pull/122 We can keep it as is for now. Closing -- This is an automated

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura closed issue #389: PR build for Linux Java 11 with Spark 3.4 is not running URL: https://github.com/apache/datafusion-comet/issues/389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
backkem commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2126248908 Do we split off a ticket reduce the nr of brackets emitted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL [datafusion]

2024-05-22 Thread via GitHub
backkem commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2126244362 Also, I think it's the idea to move the entire unparser to the SQLParser package eventually. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1610984960 ## datafusion/functions-aggregate/src/expr_builder.rs: ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-22 Thread via GitHub
Ted-Jiang commented on PR #10607: URL: https://github.com/apache/datafusion/pull/10607#issuecomment-2126242974 > @Ted-Jiang I wonder if you might have time to review this PR at some point? Thanks ❤️ @alamb i will review this carefully later today -- This is an automated message

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610976334 ## spark/src/test/spark-3.4-plus/org/apache/comet/exec/CometExec3_4PlusSuite.scala: ## @@ -29,7 +29,7 @@ import org.apache.comet.CometConf /** * This test

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610951350 ## docs/source/user-guide/tuning.md: ## @@ -39,22 +39,26 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610733007 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -724,89 +700,80 @@ class CometSparkSessionExtensions } // We

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
advancedxy commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610932255 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,38 @@ class DataGenerator(r: Random) { Range(0, n).map(_ =>

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126184432 cc @alamb I will fix this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 opened a new issue, #10630: URL: https://github.com/apache/datafusion/issues/10630 ### Describe the bug In the definition of `AggregateExec`, we have the `limit` argument to optimize the agg operation. ``` pub struct AggregateExec { . /// Set

Re: [PR] Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL [datafusion]

2024-05-22 Thread via GitHub
phillipleblanc commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2126149300 > I wonder if we should update the comments in `Dialect` so that we can upstream the trait once we have some more experience with what features are needed in DataFusion 樂

Re: [PR] Add support for Substrait List/EmptyList literals [datafusion]

2024-05-22 Thread via GitHub
jonahgao commented on code in PR #10615: URL: https://github.com/apache/datafusion/pull/10615#discussion_r1610851138 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1278,6 +1279,45 @@ pub(crate) fn from_substrait_literal(lit: ) -> Result { s,

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610846336 ## spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [I] PlaceholderRowExec shown when select from union results. [datafusion]

2024-05-22 Thread via GitHub
xinlifoobar closed issue #10613: PlaceholderRowExec shown when select from union results. URL: https://github.com/apache/datafusion/issues/10613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-22 Thread via GitHub
parthchandra commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1610819301 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610831950 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2126012154 > which means both adding many new functions as well as that they won't work for user defined aggregate functions I think they can build the function with macro in

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2126011843 > Thank you @jayzhan211 -- this is looking really cool. I have some feeback on the API design > > One major thing I think that might be worth considering is now to get one

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610830921 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-22 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2125995127 Looks good. It is clearer to understand the results in this way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-22 Thread via GitHub
goldmedal commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2125974338 Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Incorrect statistics read for struct array in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10609: URL: https://github.com/apache/datafusion/issues/10609#issuecomment-2125961249 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2125939100 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] fix Incorrect statistics read for i8 i16 columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms opened a new pull request, #10629: URL: https://github.com/apache/datafusion/pull/10629 ## Which issue does this PR close? Closes #10585 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125933413 Speaking of the Arrow metadata constraints, I'm still pondering how best to try and maintain some semblance of type information here. Looking through the dependency tree it seems

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2125926141 I have detected the location which causes the bug, seems like in arrow-rs, when getting DataType::Int8, we always return a Int32 Array Instead

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125922609 @alamb I've updated things to use `METADTA(foo='bar')` though I'm not 100% on whether I've done it acceptably as I've basically only just allowed the use of `METADATA` in place of

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610733007 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -724,89 +700,80 @@ class CometSparkSessionExtensions } // We

[PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya opened a new pull request, #460: URL: https://github.com/apache/datafusion-comet/pull/460 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125789754 @alamb That's an excellent point on the possible confusion between the two slightly different syntaxes for the same keyword in a statement. I obviously just went with what already

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-22 Thread via GitHub
parthchandra commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1610670927 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1444,13 +1483,136 @@ fn parse_str_to_time_only_timestamp(value: ) -> CometResult> {

Re: [PR] adding benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on PR #10610: URL: https://github.com/apache/datafusion/pull/10610#issuecomment-2125776581 > Thank you so much @Lordworms  > > This is perfect. I ran it locally, did some profiling, and filed #10626 to improve things. > > cc @NGA-TRAN Sure, I would

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-22 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2125749341 I had to change the `describe` test because the float / double columns have nans (or maybe -inf?). This means that the `min`, which used to be a value, is now nan (which doesn't

Re: [I] Excessive memory consumption on sorting [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10511: URL: https://github.com/apache/datafusion/issues/10511#issuecomment-2125748590 樂 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches * 20 is the `k` * 14 is the number of file

Re: [PR] improve monotonicity api [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2125736201 Thanks everyone for the communication. I just want everyone working on this project to have a good experience -- I am glad to hear that it was. Thanks again @tinfoil-knight and

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10592: URL: https://github.com/apache/datafusion/pull/10592#discussion_r1610641830 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -624,20 +624,281 @@ async fn test_dates_64_diff_rg_sizes() { .run("date64"); } +// BUG: +//

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2125733405 FWI https://github.com/apache/datafusion/pull/10573 is merged! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10557: Make SQL strings generated from `Expr`s "prettier" URL: https://github.com/apache/datafusion/issues/10557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10573: URL: https://github.com/apache/datafusion/pull/10573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Support unnest for struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10264: Support unnest for struct data type URL: https://github.com/apache/datafusion/issues/10264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: extend `unnest` to support Struct datatype [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10429: URL: https://github.com/apache/datafusion/pull/10429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2125722281 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the

Re: [PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10574: URL: https://github.com/apache/datafusion/pull/10574#issuecomment-2125721583  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610626756 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610627390 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610626756 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10549: URL: https://github.com/apache/datafusion/pull/10549#issuecomment-2125713470 This PR is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-22 Thread via GitHub
alamb opened a new issue, #10628: URL: https://github.com/apache/datafusion/issues/10628 ### Is your feature request related to a problem or challenge? The usecase of walking `LogicalPlan` to either analyze the plan or rewrite it has come up multiple times recently

Re: [I] PlaceholderRowExec shown when select from union results. [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10613: URL: https://github.com/apache/datafusion/issues/10613#issuecomment-2125679429 > This is like an optimization for empty tables. The result above is when both t3 and t4 are empty. After insert some values, the plan displayed correctly. I believe that

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1610577152 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,31 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Draft: Add pyi stubs for type hinting [datafusion-python]

2024-05-22 Thread via GitHub
timsaucer closed pull request #709: Draft: Add pyi stubs for type hinting URL: https://github.com/apache/datafusion-python/pull/709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-22 Thread via GitHub
westonpace opened a new pull request, #10627: URL: https://github.com/apache/datafusion/pull/10627 ## Which issue does this PR close? Closes #8031 ## Rationale for this change See the motivating issue. ## What changes are included in this PR? The

Re: [PR] feat: extend `unnest` to support Struct datatype [datafusion]

2024-05-22 Thread via GitHub
duongcongtoai commented on code in PR #10429: URL: https://github.com/apache/datafusion/pull/10429#discussion_r1610578088 ## datafusion/sqllogictest/test_files/unnest.slt: ## @@ -288,6 +308,18 @@ select unnest(array_remove(column1, 12)) from unnest_table; 5 6 +## unnest

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-22 Thread via GitHub
huaxingao commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1610574040 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1212,6 +1212,157 @@ class CometAggregateSuite extends CometTestBase with

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-22 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2125641924 I've rebased the PR on the latest `main`. The first commit is exactly the same as the previous state of this PR was, the second commit changes `TreeNode:apply()` and

Re: [I] Implement a benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10606: Implement a benchmark for extracting arrow statistics from parquet URL: https://github.com/apache/datafusion/issues/10606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] adding benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10610: URL: https://github.com/apache/datafusion/pull/10610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[I] Improve performance of extracting statistics from parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb opened a new issue, #10626: URL: https://github.com/apache/datafusion/issues/10626 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10453 @Lordworms added a benchmark for extracting statistics from parquet

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1610537274 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10593: URL: https://github.com/apache/datafusion/pull/10593#issuecomment-2125592185 This file had some merge conflicts so I took the liberty of merging up from main and addressing the comment https://github.com/apache/datafusion/pull/10593#discussion_r1608835566 in

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1610537274 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

Re: [PR] Test for reading read statistics from parquet files without statistics and boolean & struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10608: URL: https://github.com/apache/datafusion/pull/10608#issuecomment-2125584421 Since this was test only code I just merged it in to keep things moving -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Test for reading read statistics from parquet files without statistics and boolean & struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10608: URL: https://github.com/apache/datafusion/pull/10608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[I] Render tables using html in notebooks. [datafusion-python]

2024-05-22 Thread via GitHub
timsaucer opened a new issue, #713: URL: https://github.com/apache/datafusion-python/issues/713 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Many users, especially those who want to try out DataFusion for the first time

Re: [I] Automatically pick best shuffle implementation for a query [datafusion-comet]

2024-05-22 Thread via GitHub
vaibhawvipul commented on issue #459: URL: https://github.com/apache/datafusion-comet/issues/459#issuecomment-2125456674 I would like to work on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1610434498 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1212,6 +1212,157 @@ class CometAggregateSuite extends CometTestBase with

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#issuecomment-2125433024 LGTM overall. I think this is a good addition. I left a couple of small style suggestions. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610414004 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,38 @@ class DataGenerator(r: Random) { Range(0, n).map(_ =>

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610422808 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,38 @@ class DataGenerator(r: Random) { Range(0, n).map(_ =>

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610414004 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,38 @@ class DataGenerator(r: Random) { Range(0, n).map(_ =>

Re: [PR] Add support for Substrait List/EmptyList literals [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10615: URL: https://github.com/apache/datafusion/pull/10615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add to_unixtime function to scalar functions doc [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10620: URL: https://github.com/apache/datafusion/pull/10620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Add to_unixtime function to scalar functions doc [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10462: Add to_unixtime function to scalar functions doc URL: https://github.com/apache/datafusion/issues/10462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10623: URL: https://github.com/apache/datafusion/pull/10623#discussion_r1610405921 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -64,12 +64,43 @@ pub fn wrap_partition_value_in_dict(val: ScalarValue) -> ScalarValue {

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10623: URL: https://github.com/apache/datafusion/pull/10623#discussion_r1610405386 ## datafusion-examples/examples/csv_opener.rs: ## @@ -58,16 +57,11 @@ async fn main() -> Result<()> { let path = std::path::Path::new().canonicalize()?; -

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub
appletreeisyellow commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125344366 > I don't think this needs a performance test -- I was thinking just the main test suite I was thinking performance regression  Got it! I can put up an Influxdata

Re: [I] Make it easier to register object stores [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10616: URL: https://github.com/apache/datafusion/issues/10616#issuecomment-2125312965 Thanks again @aditanase  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Omit NULLS FIRST/LAST when deparsing ORDER BY clauses for MySQL [datafusion]

2024-05-22 Thread via GitHub
phillipleblanc commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2125198690 cc @backkem @devinjdangelo to take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Phillip/240523 fix mysql order by [datafusion]

2024-05-22 Thread via GitHub
phillipleblanc opened a new pull request, #10625: URL: https://github.com/apache/datafusion/pull/10625 ## Which issue does this PR close? Closes #10624 Closes #. ## Rationale for this change When unparsing LogicalPlans back to SQL statements for the MySQL dialect,

[I] MySQL doesn't support the `NULLS FIRST/LAST` clause in `ORDER BY` statements [datafusion]

2024-05-22 Thread via GitHub
phillipleblanc opened a new issue, #10624: URL: https://github.com/apache/datafusion/issues/10624 ### Describe the bug When unparsing LogicalPlans back to SQL statements for the MySQL dialect, any plan that includes a `Sort` node will produce a SQL statement that looks like: ```

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-22 Thread via GitHub
timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2125128453 I think you're doing a great job, and good point on the sqllogictest. TBH I find those tests harder to wrap my head around than the rust tests, but that's more personal

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-22 Thread via GitHub
viirya commented on PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#issuecomment-2125124697 Sure. I will review this again today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Docs: add examples for `RuntimeEnv::register_object_store`, improve error messages [datafusion]

2024-05-22 Thread via GitHub
aditanase commented on code in PR #10617: URL: https://github.com/apache/datafusion/pull/10617#discussion_r1610233571 ## datafusion/execution/src/runtime_env.rs: ## @@ -89,6 +89,27 @@ impl RuntimeEnv { /// scheme, if any. /// /// See [`ObjectStoreRegistry`] for

Re: [I] Support RangePartitioning with native shuffle [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on issue #458: URL: https://github.com/apache/datafusion-comet/issues/458#issuecomment-2125120170 I will work on this after finishing HashJoin build right support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2125118095 > I'm curious as to why we wouldn't want to just make date_bin timezone aware? This would appear to have a lot less potential edge cases I believe @mhilton had thought

Re: [I] Support RangePartitioning with native shuffle [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on issue #458: URL: https://github.com/apache/datafusion-comet/issues/458#issuecomment-2125116224 Since the early development of Comet shuffle, this is one item on to do list. After we finish columnar shuffle which has higher coverage and support main partitioning types,

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125115366 > I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125114372 I don't think this needs a performance test -- I was thinking just the main test suite -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] Support RangePartitioning with native shuffle [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove opened a new issue, #458: URL: https://github.com/apache/datafusion-comet/issues/458 ### What is the problem the feature request solves? We do not currently support RangePartitioning with native shuffle. Adding this support would allow us to use native shuffle for

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-22 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2125106167 Have you checked tests in [sqllogictest](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/window.slt)? If we want to make sure all the

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-22 Thread via GitHub
Blizzara commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2125102187 @jonahgao second extracted part :) this one has a bit of overlap with the list PR but I'm happy to fix conflicts once one of these is merged (order shouldn't matter) -- This is

Re: [PR] Docs: add examples for `RuntimeEnv::register_object_store`, improve error messages [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10617: URL: https://github.com/apache/datafusion/pull/10617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

  1   2   >