Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-08-08 Thread via GitHub
peter-toth commented on PR #11683: URL: https://github.com/apache/datafusion/pull/11683#issuecomment-2275145283 > I am sorry -- something came up at work today and I am super behind on reviews. this is very high on my list for tomorow No worries @alamb. -- This is an automated mess

Re: [I] Explore Updating VariadicAny Signature to take 0 Args [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on issue #11522: URL: https://github.com/apache/datafusion/issues/11522#issuecomment-2275204276 The reason is that we currently use `VariadicAny` for functions that at least one args. If we now accept empty args for `VariadicAny`, we need to do tons of non-zero length c

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
mustafasrepo commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1707228670 ## datafusion/core/src/dataframe/mod.rs: ## @@ -3019,11 +3019,11 @@ mod tests { assert_batches_sorted_eq!( [ -"+-+

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
mustafasrepo commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1707228670 ## datafusion/core/src/dataframe/mod.rs: ## @@ -3019,11 +3019,11 @@ mod tests { assert_batches_sorted_eq!( [ -"+-+

Re: [I] Strengthen TypeSignature and Coercion rule. [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on issue #10507: URL: https://github.com/apache/datafusion/issues/10507#issuecomment-2275292135 Signature does 3 things 1. Length check 2. Type check 3. Coercion For length, the common length check are 1. Exact number 2. Variadic (Any number) 3

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275337625 Thanks @ozankabak for your review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709117604 ## datafusion/functions-aggregate-common/src/accumulator.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1709146891 ## datafusion/core/src/physical_optimizer/enforce_sorting.rs: ## @@ -61,6 +61,7 @@ use crate::physical_plan::{Distribution, ExecutionPlan, InputOrderMode}; u

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
mustafasrepo commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1707228670 ## datafusion/core/src/dataframe/mod.rs: ## @@ -3019,11 +3019,11 @@ mod tests { assert_batches_sorted_eq!( [ -"+-+

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
mustafasrepo commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1707228670 ## datafusion/core/src/dataframe/mod.rs: ## @@ -3019,11 +3019,11 @@ mod tests { assert_batches_sorted_eq!( [ -"+-+

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1709153015 ## datafusion/core/src/physical_optimizer/sort_pushdown.rs: ## @@ -30,49 +29,73 @@ use crate::physical_plan::sorts::sort::SortExec; use crate::physical_plan::tr

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
mustafasrepo commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1707228670 ## datafusion/core/src/dataframe/mod.rs: ## @@ -3019,11 +3019,11 @@ mod tests { assert_batches_sorted_eq!( [ -"+-+

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on code in PR #11875: URL: https://github.com/apache/datafusion/pull/11875#discussion_r1709162446 ## datafusion/core/src/physical_optimizer/sort_pushdown.rs: ## @@ -132,6 +172,41 @@ fn pushdown_requirement_to_children( RequirementsCompatibility::

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275479969 It would be great to visualize the dependency graph after this refactor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[I] Configurable null_equals_null flag [datafusion]

2024-08-08 Thread via GitHub
berkaysynnada opened a new issue, #11883: URL: https://github.com/apache/datafusion/issues/11883 ### Is your feature request related to a problem or challenge? Currently the `null_equals_null` flag in DF joins is hard-coded to false across multiple places in the code. It is not config

Re: [I] Incorrect common subexpression elimination of a subquery [datafusion]

2024-08-08 Thread via GitHub
helgikrs closed issue #8190: Incorrect common subexpression elimination of a subquery URL: https://github.com/apache/datafusion/issues/8190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Incorrect common subexpression elimination of a subquery [datafusion]

2024-08-08 Thread via GitHub
helgikrs commented on issue #8190: URL: https://github.com/apache/datafusion/issues/8190#issuecomment-2275493853 Closing this issue since it seems to be resolved. Running the repro (on v40) I get the following (correct) result. ```sql > explain select * from a where a in (select b

Re: [PR] Update ASCII scalar function to support Utf8View #11834 [datafusion]

2024-08-08 Thread via GitHub
dmitrybugakov closed pull request #11851: Update ASCII scalar function to support Utf8View #11834 URL: https://github.com/apache/datafusion/pull/11851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Update ASCII scalar function to support Utf8View #11834 [datafusion]

2024-08-08 Thread via GitHub
dmitrybugakov commented on PR #11851: URL: https://github.com/apache/datafusion/pull/11851#issuecomment-2275503502 I will re-create the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Support `extract` on intervals [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11501: URL: https://github.com/apache/datafusion/pull/11501#discussion_r1709219857 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -81,6 +82,9 @@ impl DatePartFunc { Exact(vec![Utf8, Time32(Millisecond)]),

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on PR #11454: URL: https://github.com/apache/datafusion/pull/11454#issuecomment-2275549677 @nix010 we really need this, any chance you could take a look at failing tests, otherwise I'd be happy to have a go at it. -- This is an automated message from the Apache Git

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11454: URL: https://github.com/apache/datafusion/pull/11454#discussion_r1709238159 ## datafusion/expr/src/columnar_value.rs: ## @@ -183,6 +185,43 @@ impl ColumnarValue { Ok(args) } +fn _format_interval_string_value(&self,

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11454: URL: https://github.com/apache/datafusion/pull/11454#discussion_r1709239187 ## datafusion/expr/src/columnar_value.rs: ## @@ -206,7 +245,18 @@ impl ColumnarValue { scalar.to_array()?

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11454: URL: https://github.com/apache/datafusion/pull/11454#discussion_r1709241062 ## datafusion/sqllogictest/test_files/interval.slt: ## @@ -413,95 +413,47 @@ select '1980-01-01'::timestamp - interval '1 day' ### date / timestamp (arra

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11454: URL: https://github.com/apache/datafusion/pull/11454#discussion_r1709242250 ## datafusion/sqllogictest/test_files/interval.slt: ## @@ -623,5 +547,21 @@ select true true true true true true +### interval with string literal de

[PR] Update ASCII scalar function to support Utf8View #11834 [datafusion]

2024-08-08 Thread via GitHub
dmitrybugakov opened a new pull request, #11884: URL: https://github.com/apache/datafusion/pull/11884 ## Which issue does this PR close? Closes #11834. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2275575262 I think this data is very interesting and we should look more deeply into why is the single group mode faster than doing a repartition / aggregate. It seems like the only di

Re: [PR] Single mode for multi column group by -- Almost 2x for ClickBench Q32 [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11792: URL: https://github.com/apache/datafusion/pull/11792#issuecomment-2275576800 Here are some more thoughts about why this approach works and an alternate idea of how we can make these queries all faster: https://github.com/apache/datafusion/issues/11680#issuecomm

Re: [PR] Support `convert_to_state` for `AVG` accumulator [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11734: URL: https://github.com/apache/datafusion/pull/11734#issuecomment-2275579264 > LGTM, thank you @alamb. > > Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over

Re: [I] Incorrect common subexpression elimination of a subquery [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #8190: URL: https://github.com/apache/datafusion/issues/8190#issuecomment-2275580319 🎉 likely related to some of @peter-toth 's great work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Improve comments in row_hash.rs for skipping aggregation [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11820: URL: https://github.com/apache/datafusion/pull/11820#discussion_r1709270042 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -100,22 +100,24 @@ struct SpillState { /// /// See "partial aggregation" discussion on [`GroupedHashA

Re: [PR] support `ANY()` op [datafusion]

2024-08-08 Thread via GitHub
alamb merged PR #11849: URL: https://github.com/apache/datafusion/pull/11849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Some memory reservations of GroupedHashAggregateStream seem to be mis-tagged as spillable while they do not allow spilling [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #11390: URL: https://github.com/apache/datafusion/issues/11390#issuecomment-2275595828 Hi @Ablu -- it could be. In general I think our support of ensuring a sort can spill might need some tweaking / help -- once the system is under memory pressure (aka the po

Re: [PR] Impl `convert_to_state` for `GroupsAccumulatorAdapter`. [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11827: URL: https://github.com/apache/datafusion/pull/11827#discussion_r1709282866 ## datafusion/physical-expr/src/aggregate/groups_accumulator/adapter.rs: ## @@ -342,6 +374,50 @@ impl GroupsAccumulator for GroupsAccumulatorAdapter { fn size(&

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2275623959 > This test may be incomplete, do you @alamb have any better test suggestions? 🤔 Hi @JasonLi-cn -- yes I think we should run the ClickBench and TPCH benchmarks using the script

[I] Add CreateArray support [datafusion-comet]

2024-08-08 Thread via GitHub
Kimahriman opened a new issue, #792: URL: https://github.com/apache/datafusion-comet/issues/792 ### What is the problem the feature request solves? As a starting point for supporting array expressions, we should add support for creating arrays. ### Describe the potential soluti

Re: [I] Incorrect common subexpression elimination of a subquery [datafusion]

2024-08-08 Thread via GitHub
waynexia commented on issue #8190: URL: https://github.com/apache/datafusion/issues/8190#issuecomment-2275630277 Well done! Appreciate it @helgikrs @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
timsaucer commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275631290 [reduced datafusion dependencies.pdf](https://github.com/user-attachments/files/16542723/reduced.datafusion.dependencies.pdf) I'm trying to wrap my head around the current de

[PR] feature: `CreateArray` support [datafusion-comet]

2024-08-08 Thread via GitHub
Kimahriman opened a new pull request, #793: URL: https://github.com/apache/datafusion-comet/pull/793 ## Which issue does this PR close? Closes #792 ## Rationale for this change ## What changes are included in this PR? Adds support for the `Crea

Re: [I] Support Arrays for the Map scalar functions [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #11436: URL: https://github.com/apache/datafusion/issues/11436#issuecomment-2275635437 > Shouldn't we have two different function to support these two use cases ? Or any ideas on how should we support both usecases using using single map functio In general, I

Re: [PR] feature: `CreateArray` support [datafusion-comet]

2024-08-08 Thread via GitHub
Kimahriman commented on code in PR #793: URL: https://github.com/apache/datafusion-comet/pull/793#discussion_r1709318923 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2348,6 +2348,27 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] Add tests for StringView / character functions, fix `regexp_like` and `regexp_match` to work with StringView [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11753: URL: https://github.com/apache/datafusion/pull/11753#discussion_r1709321531 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -75,13 +75,8 @@ impl ScalarUDFImpl for RegexpLikeFunc { use DataType::*; Ok(match &arg_typ

Re: [PR] Add tests for StringView / character functions, fix `regexp_like` and `regexp_match` to work with StringView [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11753: URL: https://github.com/apache/datafusion/pull/11753#discussion_r1709323282 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -75,13 +75,8 @@ impl ScalarUDFImpl for RegexpLikeFunc { use DataType::*; Ok(match &arg_typ

[I] Use `Arc` rather than `Statistics` in `PartitionedFile` [datafusion]

2024-08-08 Thread via GitHub
alamb opened a new issue, #11885: URL: https://github.com/apache/datafusion/issues/11885 ### Is your feature request related to a problem or challenge? We are trying to improve the speed of DataFusion when running the ClickBench partitioned test (which has 100 files) -- this means the

Re: [PR] Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11802: URL: https://github.com/apache/datafusion/pull/11802#issuecomment-2275656062 > @alamb I found the reason finally, > > > Removed the api change label as we have now removed the `Arc` > > I agree the timings look good now. > > Would you be willing to c

Re: [I] Use `Arc` rather than `Statistics` in `PartitionedFile` [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #11885: URL: https://github.com/apache/datafusion/issues/11885#issuecomment-2275655723 I think this is a good first issue as the code is relatively simple and mechanical. We'll help do the benchmarking The benchmarks are described in https://github.com/apache

Re: [I] `approx_percentile_cont_with_weight` result type changed from datafusion v39 to v40 [datafusion]

2024-08-08 Thread via GitHub
alamb commented on issue #11874: URL: https://github.com/apache/datafusion/issues/11874#issuecomment-2275662532 I agree we can close this issue as "working as designed" -- thanks @Michael-J-Ward and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] `approx_percentile_cont_with_weight` result type changed from datafusion v39 to v40 [datafusion]

2024-08-08 Thread via GitHub
alamb closed issue #11874: `approx_percentile_cont_with_weight` result type changed from datafusion v39 to v40 URL: https://github.com/apache/datafusion/issues/11874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275666370 FYI @andygrove and @viirya as I suspect this may impact comet (though I haven't looked at it thoroughly yet) Given the size of this change and that we are getting close (days i

Re: [PR] Add tests for StringView / character functions, fix `regexp_like` and `regexp_match` to work with StringView [datafusion]

2024-08-08 Thread via GitHub
edmondop commented on code in PR #11753: URL: https://github.com/apache/datafusion/pull/11753#discussion_r1709361036 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -75,13 +75,8 @@ impl ScalarUDFImpl for RegexpLikeFunc { use DataType::*; Ok(match &arg_

Re: [I] Support Arrays for the Map scalar functions [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on issue #11436: URL: https://github.com/apache/datafusion/issues/11436#issuecomment-2275680638 @dharanad I think `select Map {col4: col5} from t;` is equivalent to `SELECT MAP([col4], [col5]);`, so we just need to rewrite the syntax the existing map function. -- Th

Re: [I] Use `Arc` rather than `Statistics` in `PartitionedFile` [datafusion]

2024-08-08 Thread via GitHub
Rachelint commented on issue #11885: URL: https://github.com/apache/datafusion/issues/11885#issuecomment-2275681766 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Use `Arc` rather than `Statistics` in `PartitionedFile` [datafusion]

2024-08-08 Thread via GitHub
Rachelint commented on issue #11885: URL: https://github.com/apache/datafusion/issues/11885#issuecomment-2275684725 May try the idea mentioned soon after the experiment about partitioned using hashmap in agg. https://github.com/apache/datafusion/blob/9503456388544788e1a881a0a80a3c61ac015

Re: [PR] feat: expose centroids in approx_percentile_count fluent api [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on code in PR #11878: URL: https://github.com/apache/datafusion/pull/11878#discussion_r1709392439 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -46,13 +46,21 @@ use datafusion_physical_expr_common::aggregate::tdigest::{ }; use dat

Re: [PR] Support NULL literal in Min/Max [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar commented on code in PR #11812: URL: https://github.com/apache/datafusion/pull/11812#discussion_r1709404834 ## datafusion/core/tests/dataframe/describe.rs: ## @@ -102,8 +102,8 @@ async fn describe_null() -> Result<()> { "| null_count | 0| 1|",

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on PR #11454: URL: https://github.com/apache/datafusion/pull/11454#issuecomment-2275720951 I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Support NULL literal in Min/Max [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar commented on code in PR #11812: URL: https://github.com/apache/datafusion/pull/11812#discussion_r1709407323 ## datafusion/core/tests/dataframe/describe.rs: ## @@ -102,8 +102,8 @@ async fn describe_null() -> Result<()> { "| null_count | 0| 1|",

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709360078 ## datafusion/expr-common/src/lib.rs: ## @@ -0,0 +1,32 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [I] Update `INITCAP` scalar function to support `Utf8View` [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar commented on issue #11853: URL: https://github.com/apache/datafusion/issues/11853#issuecomment-2275729774 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Enforce sorting handle fetchable operators. [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11875: URL: https://github.com/apache/datafusion/pull/11875#issuecomment-2275737636 > @alamb it would be great if you could take a look Will put it on my list for today -- This is an automated message from the Apache Git Service. To respond to the message, ple

[I] Float to int cast behaviour [datafusion]

2024-08-08 Thread via GitHub
oleggator opened a new issue, #11886: URL: https://github.com/apache/datafusion/issues/11886 At the moment, DataFusion casts floats to integers using the floor function. Is it possible to configure this behavior? ## Actual ``` select cast(2.8 as int) as result; ++

[PR] Remove many `crate::` imports in listing table provider module [datafusion]

2024-08-08 Thread via GitHub
findepi opened a new pull request, #11887: URL: https://github.com/apache/datafusion/pull/11887 This is part of isolating this module in order to be able to move it out of core. For https://github.com/apache/datafusion/issues/10782 -- This is an automated message from the Apache Gi

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709491881 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -44,6 +44,8 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true } datafusion-exe

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709494753 ## datafusion/physical-expr/src/expressions/column.rs: ## @@ -136,6 +138,35 @@ pub fn col(name: &str, schema: &Schema) -> Result> { Ok(Arc::new(Column::new

Re: [PR] Update ASCII scalar function to support Utf8View #11834 [datafusion]

2024-08-08 Thread via GitHub
XiangpengHao commented on code in PR #11884: URL: https://github.com/apache/datafusion/pull/11884#discussion_r1709509871 ## datafusion/functions/src/string/ascii.rs: ## @@ -87,12 +69,92 @@ impl ScalarUDFImpl for AsciiFunc { } fn invoke(&self, args: &[ColumnarValue])

[PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar opened a new pull request, #11888: URL: https://github.com/apache/datafusion/pull/11888 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] Update ASCII scalar function to support Utf8View #11834 [datafusion]

2024-08-08 Thread via GitHub
XiangpengHao commented on code in PR #11884: URL: https://github.com/apache/datafusion/pull/11884#discussion_r1709516929 ## datafusion/functions/src/string/ascii.rs: ## @@ -87,12 +69,92 @@ impl ScalarUDFImpl for AsciiFunc { } fn invoke(&self, args: &[ColumnarValue])

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar commented on code in PR #11888: URL: https://github.com/apache/datafusion/pull/11888#discussion_r1709519661 ## datafusion/functions/src/string/initcap.rs: ## @@ -88,28 +91,40 @@ fn initcap(args: &[ArrayRef]) -> Result { // first map is the iterator, second is f

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
xinlifoobar commented on code in PR #11888: URL: https://github.com/apache/datafusion/pull/11888#discussion_r1709523168 ## datafusion/functions/src/string/initcap.rs: ## @@ -88,28 +91,40 @@ fn initcap(args: &[ArrayRef]) -> Result { // first map is the iterator, second is f

Re: [PR] feat: expose centroids in approx_percentile_count fluent api [datafusion]

2024-08-08 Thread via GitHub
Michael-J-Ward commented on code in PR #11878: URL: https://github.com/apache/datafusion/pull/11878#discussion_r1709524393 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -394,6 +394,21 @@ async fn test_fn_approx_percentile_cont() -> Result<()> { assert_b

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275877237 > I do think we should consider waiting to merge until we have made a 41.0.0 release candidate so we have the maximum time to work out any kinks this might cause Reasonable,

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
XiangpengHao commented on code in PR #11888: URL: https://github.com/apache/datafusion/pull/11888#discussion_r1709547559 ## datafusion/functions/src/string/initcap.rs: ## @@ -88,28 +91,40 @@ fn initcap(args: &[ArrayRef]) -> Result { // first map is the iterator, second is

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
XiangpengHao commented on code in PR #11888: URL: https://github.com/apache/datafusion/pull/11888#discussion_r1709551000 ## datafusion/functions/src/string/initcap.rs: ## @@ -88,28 +91,40 @@ fn initcap(args: &[ArrayRef]) -> Result { // first map is the iterator, second is

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
ozankabak commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709553453 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -44,6 +44,8 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true } datafusion-exec

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-08 Thread via GitHub
XiangpengHao commented on code in PR #11888: URL: https://github.com/apache/datafusion/pull/11888#discussion_r1709554947 ## datafusion/functions/src/string/initcap.rs: ## @@ -153,6 +168,34 @@ mod tests { Utf8, StringArray ); +test_funct

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
andygrove commented on PR #11845: URL: https://github.com/apache/datafusion/pull/11845#issuecomment-2275904695 > > I do think we should consider waiting to merge until we have made a 41.0.0 release candidate so we have the maximum time to work out any kinks this might cause > > Reaso

Re: [PR] UDAF refactor: Add PhysicalExpr trait dependency on `datafusion-expr` and remove logical expressions requirement for creating physical aggregate expression [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709590649 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -44,6 +44,8 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true } datafusion-executio

Re: [I] Float to int cast behaviour [datafusion]

2024-08-08 Thread via GitHub
jayzhan211 commented on issue #11886: URL: https://github.com/apache/datafusion/issues/11886#issuecomment-2275937010 I think we would need to support configuration in arrow-rs first. The actual casting is here https://github.com/apache/arrow-rs/blob/e28cf44e65cdd1fefaa1b857ca4336e7a9da5d06/

[PR] chore: Prepare 41.0.0-rc1 [datafusion]

2024-08-08 Thread via GitHub
andygrove opened a new pull request, #11889: URL: https://github.com/apache/datafusion/pull/11889 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11476 ## Rationale for this change ## What changes are included in this

Re: [PR] chore: Prepare 41.0.0-rc1 [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11889: URL: https://github.com/apache/datafusion/pull/11889#discussion_r1709615047 ## dev/changelog/41.0.0.md: ## @@ -0,0 +1,363 @@ + + +# Apache DataFusion 41.0.0 Changelog + +This release consists of 245 commits from 69 contributors. See credits

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2275957483 I hit a bug https://github.com/apache/datafusion/pull/11833 that has been fixed on main when trying to run the benchmarks on this branch: ``` Query 19 avg time: 161.22 ms E

Re: [PR] Improve `AccumulatorArgs` by removing the usgaes of `input_types` [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11761: URL: https://github.com/apache/datafusion/pull/11761#issuecomment-2275964696 Superceded by https://github.com/apache/datafusion/pull/11845 I think, so marking it as draft as it isn't waiting for review -- This is an automated message from the Apache Git Servi

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
nix010 commented on PR #11454: URL: https://github.com/apache/datafusion/pull/11454#issuecomment-2275971870 > I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs. Ok, then I think this PR can be closed. -- This is an automated me

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
nix010 commented on PR #11454: URL: https://github.com/apache/datafusion/pull/11454#issuecomment-2275972593 > I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs. Ok, then I think this PR can be closed. -- This is an automated me

[PR] chore: Use DataFusion 41.0.0-rc1 [datafusion-comet]

2024-08-08 Thread via GitHub
andygrove opened a new pull request, #794: URL: https://github.com/apache/datafusion-comet/pull/794 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] chore: Use DataFusion 41.0.0-rc1 [datafusion-comet]

2024-08-08 Thread via GitHub
andygrove commented on code in PR #794: URL: https://github.com/apache/datafusion-comet/pull/794#discussion_r1709636586 ## native/Cargo.toml: ## @@ -39,13 +39,13 @@ arrow-buffer = { version = "52.2.0" } arrow-data = { version = "52.2.0" } arrow-schema = { version = "52.2.0" }

Re: [PR] Avoid copy when creating arrow buffers [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11840: URL: https://github.com/apache/datafusion/pull/11840#issuecomment-2275973242 Merged up to resolve a conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] Sync rust docs params for CI and dev [datafusion]

2024-08-08 Thread via GitHub
findepi opened a new pull request, #11890: URL: https://github.com/apache/datafusion/pull/11890 Since a4ac0829ecf63b3640315835b1374211dfadd953 commit there was a discrepancy between rust.yml GitHub workflow and the `dev/rust_lint.sh` script behavior. Sync the behaviors. Reuse common script

Re: [PR] Fix documentation warnings, make CsvExecBuilder and Unparsed pub [datafusion]

2024-08-08 Thread via GitHub
findepi commented on code in PR #11729: URL: https://github.com/apache/datafusion/pull/11729#discussion_r1709639566 ## .github/workflows/rust.yml: ## @@ -234,7 +234,7 @@ jobs: rust-version: stable - name: Run cargo doc run: | - export RUSTDOCF

Re: [I] `interval` cast has some problematic behaviour [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on issue #11271: URL: https://github.com/apache/datafusion/issues/11271#issuecomment-2276095327 > @samuelcolvin Just want to see what is scope that we want to cover for this case. Are we ok with just `second`? I think fixing one case doesn't really help.

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11683: URL: https://github.com/apache/datafusion/pull/11683#discussion_r1709709933 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -227,69 +227,65 @@ impl CommonSubexprEliminate { .into_iter()

Re: [PR] Sync rust docs params for CI and dev [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11890: URL: https://github.com/apache/datafusion/pull/11890#issuecomment-2276099698 I double checked the doc check is run: https://github.com/apache/datafusion/actions/runs/10304040055/job/28521993389?pr=11890 -- This is an automated message from the Apache Git Serv

Re: [PR] Sync rust docs params for CI and dev [datafusion]

2024-08-08 Thread via GitHub
alamb merged PR #11890: URL: https://github.com/apache/datafusion/pull/11890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on PR #11454: URL: https://github.com/apache/datafusion/pull/11454#issuecomment-2276105530 See https://github.com/apache/arrow-rs/pull/6211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Can not compare [datafusion]

2024-08-08 Thread via GitHub
alamb opened a new issue, #11891: URL: https://github.com/apache/datafusion/issues/11891 ### Is your feature request related to a problem or challenge? (broken out from https://github.com/apache/datafusion/pull/11876 for more visibility) @samuelcolvin writes: We regular

Re: [PR] Fix `Duration` vs `Interval` comparisons and `Interval` as LHS [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11876: URL: https://github.com/apache/datafusion/pull/11876#discussion_r1709774179 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -3109,6 +3109,43 @@ SELECT * FROM VALUES 2024-02-01T08:00:00Z 2023-12-31T23:00:00Z +# interval vs. du

Re: [PR] feat: Add MaxTime type for a Time that returns the max on aggregation [datafusion]

2024-08-08 Thread via GitHub
alamb commented on PR #11755: URL: https://github.com/apache/datafusion/pull/11755#issuecomment-2276116454 Marking as a draft to signify this isn't waiting on another review. It isn't clear to me how to proceed with this PR given https://github.com/apache/datafusion/issues/11754 --

Re: [PR] Fix `Duration` vs `Interval` comparisons and `Interval` as LHS [datafusion]

2024-08-08 Thread via GitHub
samuelcolvin commented on code in PR #11876: URL: https://github.com/apache/datafusion/pull/11876#discussion_r1709781918 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -3109,6 +3109,43 @@ SELECT * FROM VALUES 2024-02-01T08:00:00Z 2023-12-31T23:00:00Z +# interval

Re: [PR] Support NULL literal in Min/Max [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11812: URL: https://github.com/apache/datafusion/pull/11812#discussion_r1709784405 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5520,3 +5520,9 @@ set datafusion.explain.logical_plan_only = false; statement ok drop table employee_

Re: [PR] Support NULL literal in Min/Max [datafusion]

2024-08-08 Thread via GitHub
alamb commented on code in PR #11812: URL: https://github.com/apache/datafusion/pull/11812#discussion_r1709784405 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5520,3 +5520,9 @@ set datafusion.explain.logical_plan_only = false; statement ok drop table employee_

  1   2   3   >