Re: [I] Extract parquet statistics from `Duration` columns [datafusion]

2024-06-02 Thread via GitHub
marvinlanhenke commented on issue #10754: URL: https://github.com/apache/datafusion/issues/10754#issuecomment-2144361368 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Extract parquet statistics from `Interval` columns [datafusion]

2024-06-02 Thread via GitHub
marvinlanhenke commented on issue #10752: URL: https://github.com/apache/datafusion/issues/10752#issuecomment-2144357666 ...while looking into this I noticed, that there are no statistics written for an `Interval`, which is also described [here](https://github.com/apache/parquet-format/blob

Re: [PR] Introduce Sum UDAF [datafusion]

2024-06-02 Thread via GitHub
mustafasrepo commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1623824226 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -49,7 +51,10 @@ pub fn create_aggregate_expr( ignore_nulls: bool, is_distinct: bool,

[PR] Extract parquet statistics from Time32 and Time64 columns [datafusion]

2024-06-02 Thread via GitHub
Lordworms opened a new pull request, #10771: URL: https://github.com/apache/datafusion/pull/10771 ## Which issue does this PR close? Closes #10751 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Release DataFusion `39.0.0` [datafusion]

2024-06-02 Thread via GitHub
waynexia commented on issue #10517: URL: https://github.com/apache/datafusion/issues/10517#issuecomment-2144218089 arrow-rs 52 upgrade PR: https://github.com/apache/datafusion/pull/10765, I'll keep track on the (near) latest unreleased HEAD of arrow-rs so we can shift to the new version onc

[PR] Minor: (Doc) Enable rt-multi-thread feature for sample code [datafusion]

2024-06-02 Thread via GitHub
hsiang-c opened a new pull request, #10770: URL: https://github.com/apache/datafusion/pull/10770 ## Which issue does this PR close? Closes #. ## Rationale for this change - The sample code in `docs/source/user-guide/example-usage.md` is not working with

Re: [I] Repeat scalar function panics on negative repeat counts. [datafusion]

2024-06-02 Thread via GitHub
waynexia closed issue #10759: Repeat scalar function panics on negative repeat counts. URL: https://github.com/apache/datafusion/issues/10759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] fix: fix string repeat for negative numbers [datafusion]

2024-06-02 Thread via GitHub
waynexia merged PR #10760: URL: https://github.com/apache/datafusion/pull/10760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Add xxhash64 function support [datafusion-comet]

2024-06-02 Thread via GitHub
advancedxy commented on PR #424: URL: https://github.com/apache/datafusion-comet/pull/424#issuecomment-2144160556 Gently ping @andygrove @viirya, do you have any more comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Cleanup GetIndexedField [datafusion]

2024-06-02 Thread via GitHub
lewiszlw opened a new pull request, #10769: URL: https://github.com/apache/datafusion/pull/10769 ## Which issue does this PR close? Closes #. ## Rationale for this change I noticed that `GetIndexedField` is not used anymore after https://github.com/apache/dat

Re: [PR] Introduce Sum UDAF [datafusion]

2024-06-02 Thread via GitHub
jayzhan211 commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1623701752 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -49,7 +51,10 @@ pub fn create_aggregate_expr( ignore_nulls: bool, is_distinct: bool, )

Re: [PR] Introduce Sum UDAF [datafusion]

2024-06-02 Thread via GitHub
jayzhan211 commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1623700021 ## datafusion/expr/src/expr_schema.rs: ## @@ -158,7 +160,29 @@ impl ExprSchemable for Expr { .iter() .map(|e| e.get_ty

Re: [PR] Introduce Sum UDAF [datafusion]

2024-06-02 Thread via GitHub
jayzhan211 commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1623699419 ## datafusion/expr/src/expr.rs: ## @@ -2263,7 +2267,11 @@ mod test { let fun = find_df_window_func(name).unwrap(); let fun2 = find_df

Re: [PR] Introduce Sum UDAF [datafusion]

2024-06-02 Thread via GitHub
jayzhan211 commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1623697718 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -49,7 +51,10 @@ pub fn create_aggregate_expr( ignore_nulls: bool, is_distinct: bool, )

Re: [I] DataFrame.except() does not work with structs in schema [datafusion]

2024-06-02 Thread via GitHub
jayzhan211 commented on issue #10749: URL: https://github.com/apache/datafusion/issues/10749#issuecomment-2144076698 > I just skimmed this real quick, so I might be wrong here. > > But might the issue be rooted at arrow-rs itself: https://github.com/apache/arrow-rs/blob/master/arrow-o

Re: [I] bug: log2 produces different values than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on issue #485: URL: https://github.com/apache/datafusion-comet/issues/485#issuecomment-2144059894 It looks like `log` calls get mapped to this protobuf type: ``` message ScalarFunc { string func = 1; repeated Expr args = 2; DataType return_type =

[I] Adopt temporalio/snipsync for documentation [datafusion]

2024-06-02 Thread via GitHub
edmondop opened a new issue, #10768: URL: https://github.com/apache/datafusion/issues/10768 ### Is your feature request related to a problem or challenge? Datafusion documentation is amazing and examples are too. However, inline snippets might fall out of sync from the codebase and ar

Re: [I] bug: log10 returns different results than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove closed issue #484: bug: log10 returns different results than Spark in some cases URL: https://github.com/apache/datafusion-comet/issues/484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] bug: log10 returns different results than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on issue #484: URL: https://github.com/apache/datafusion-comet/issues/484#issuecomment-2144052886 This is likely a duplicate of https://github.com/apache/datafusion-comet/issues/485 -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] feat: Add "Comet Fuzz" fuzz-testing utility [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on code in PR #472: URL: https://github.com/apache/datafusion-comet/pull/472#discussion_r1623668599 ## fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] feat: Add "Comet Fuzz" fuzz-testing utility [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on code in PR #472: URL: https://github.com/apache/datafusion-comet/pull/472#discussion_r1623668256 ## fuzz-testing/src/main/scala/org/apache/comet/fuzz/QueryGen.scala: ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [I] bug: log2 produces different values than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
PedroMDuarte commented on issue #485: URL: https://github.com/apache/datafusion-comet/issues/485#issuecomment-2144040156 Thanks for replying Andy. I was looking at this as a good first issue for me. I browsed through the code but couldn't determine where the issue should be addressed.

Re: [I] bug: log2 produces different values than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on issue #485: URL: https://github.com/apache/datafusion-comet/issues/485#issuecomment-2144034230 > The queries all have an `ORDER BY` clause and single partition, so should be deterministic. There is always the possibility that there is a bug in the fuzz testing tool t

Re: [I] bug: capacity overflow in repeat function [datafusion-comet]

2024-06-02 Thread via GitHub
andygrove commented on issue #482: URL: https://github.com/apache/datafusion-comet/issues/482#issuecomment-2144027547 Thanks @tshauck. It is nice to see the fuzzing resulting in improvements upstream. This does seem like an edge case that users are unlikely to hit so I think we can wait un

Re: [PR] Extract parquet statistics from f16 columns, add `ScalarValue::Float16` [datafusion]

2024-06-02 Thread via GitHub
Lordworms commented on PR #10763: URL: https://github.com/apache/datafusion/pull/10763#issuecomment-2143996064 seems like there is a CI/CD issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Extract parquet statistics from `Interval` columns [datafusion]

2024-06-02 Thread via GitHub
marvinlanhenke commented on issue #10752: URL: https://github.com/apache/datafusion/issues/10752#issuecomment-2143979816 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support unparsing LogicalPlan::Window nodes [datafusion]

2024-06-02 Thread via GitHub
devinjdangelo commented on code in PR #10767: URL: https://github.com/apache/datafusion/pull/10767#discussion_r1623586842 ## datafusion/sql/src/unparser/utils.rs: ## @@ -82,3 +91,28 @@ pub(crate) fn unproject_agg_exprs(expr: &Expr, agg: &Aggregate) -> Result })

Re: [PR] feat: support unparsing LogicalPlan::Window nodes [datafusion]

2024-06-02 Thread via GitHub
yyy1000 commented on code in PR #10767: URL: https://github.com/apache/datafusion/pull/10767#discussion_r1623523993 ## datafusion/sql/src/unparser/utils.rs: ## @@ -82,3 +91,28 @@ pub(crate) fn unproject_agg_exprs(expr: &Expr, agg: &Aggregate) -> Result }) .map

Re: [I] Extract parquet statistics from `Time32` and `Time64` columns [datafusion]

2024-06-02 Thread via GitHub
Lordworms commented on issue #10751: URL: https://github.com/apache/datafusion/issues/10751#issuecomment-2143920997 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Move `DynamicFileCatalog` back to core [datafusion]

2024-06-02 Thread via GitHub
goldmedal commented on PR #10745: URL: https://github.com/apache/datafusion/pull/10745#issuecomment-2143911214 To solve the WASM building issue, I disabled the `object_store` related feature for WASM. There is one remaining issue with macOS that I don't know how to fix. [GitHub Actio

Re: [PR] feat: support unparsing LogicalPlan::Window nodes [datafusion]

2024-06-02 Thread via GitHub
alamb commented on code in PR #10767: URL: https://github.com/apache/datafusion/pull/10767#discussion_r1623495522 ## datafusion/sql/src/unparser/plan.rs: ## @@ -162,23 +162,40 @@ impl Unparser<'_> { // A second projection implies a derived tablefactor

Re: [PR] Extract parquet statistics from timestamps with timezones [datafusion]

2024-06-02 Thread via GitHub
alamb commented on code in PR #10766: URL: https://github.com/apache/datafusion/pull/10766#discussion_r1623492772 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -592,58 +739,180 @@ async fn test_timestamp_diff_rg_sizes() { } .run(); +Test { +

Re: [I] bug: log2 produces different values than Spark in some cases [datafusion-comet]

2024-06-02 Thread via GitHub
PedroMDuarte commented on issue #485: URL: https://github.com/apache/datafusion-comet/issues/485#issuecomment-2143904091 Is it possible that the difference report is not sorting the data in the same way for spark and comet? I'm surprised by the discrepancy: ``` Spark: [1.5849625007211

Re: [PR] Extract parquet statistics from f16 columns, add `ScalarValue::Float16` [datafusion]

2024-06-02 Thread via GitHub
alamb commented on code in PR #10763: URL: https://github.com/apache/datafusion/pull/10763#discussion_r1623486294 ## datafusion/common/src/scalar/mod.rs: ## @@ -1700,7 +1722,7 @@ impl ScalarValue { ); } }; - +println!("array is {:?}

Re: [PR] Minor: Refactor memory size estimation for HashTable [datafusion]

2024-06-02 Thread via GitHub
alamb commented on code in PR #10748: URL: https://github.com/apache/datafusion/pull/10748#discussion_r1623485759 ## datafusion/common/src/utils/memory.rs: ## @@ -0,0 +1,134 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [I] Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally) [datafusion]

2024-06-02 Thread via GitHub
edmondop commented on issue #4850: URL: https://github.com/apache/datafusion/issues/4850#issuecomment-2143893437 @alamb spark SQL syntax works like so: ``` select * from parquet.`s3://foo-bar` ``` what do you think? - I wouldn't rely on the extension, and I don't know

Re: [I] Extract parquet statistics from `LargeUtf8` columns [datafusion]

2024-06-02 Thread via GitHub
alamb commented on issue #10756: URL: https://github.com/apache/datafusion/issues/10756#issuecomment-2143891584 Thanks @Weijun-H ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Suport unparsing `LogicalPlan::Window` to SQL [datafusion]

2024-06-02 Thread via GitHub
devinjdangelo commented on issue #10664: URL: https://github.com/apache/datafusion/issues/10664#issuecomment-2143881472 I took a stab at this in #10767. It can be handled similarly to how we currently handle LogicalPlan::Aggregate. -- This is an automated message from the Apache Git Serv

Re: [PR] feat: support unparsing LogicalPlan::Window nodes [datafusion]

2024-06-02 Thread via GitHub
devinjdangelo commented on code in PR #10767: URL: https://github.com/apache/datafusion/pull/10767#discussion_r1623447090 ## datafusion/sql/src/unparser/expr.rs: ## @@ -513,20 +513,30 @@ impl Unparser<'_> { fn convert_bound( &self, bound: &datafusion_expr:

[PR] feat: support unparsing LogicalPlan::Window nodes [datafusion]

2024-06-02 Thread via GitHub
devinjdangelo opened a new pull request, #10767: URL: https://github.com/apache/datafusion/pull/10767 ## Which issue does this PR close? closes #10664 ## Rationale for this change Queries involving window functions are common and should be supported for unparsing a plan

[PR] Extract parquet statistics from timestamps with timezones [datafusion]

2024-06-02 Thread via GitHub
xinlifoobar opened a new pull request, #10766: URL: https://github.com/apache/datafusion/pull/10766 ## Which issue does this PR close? Closes #10758 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-06-02 Thread via GitHub
waynexia commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1623357408 ## datafusion/substrait/src/variation_const.rs: ## @@ -37,3 +38,58 @@ pub const DEFAULT_CONTAINER_TYPE_REF: u32 = 0; pub const LARGE_CONTAINER_TYPE_REF: u32 = 1;

[PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-02 Thread via GitHub
waynexia opened a new pull request, #10765: URL: https://github.com/apache/datafusion/pull/10765 ## Which issue does this PR close? Closes #. ## Rationale for this change - Previous one: https://github.com/apache/datafusion/pull/9613 - Arrow release tic

[I] Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view [datafusion]

2024-06-02 Thread via GitHub
ahirner opened a new issue, #10764: URL: https://github.com/apache/datafusion/issues/10764 ### Describe the bug When selecting from a view that selects fields from structs, this error is thrown: ``` Internal error: NamedStructField should be rewritten in OperatorToFunction. `