Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-25 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776481456 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single val

[PR] Compare schema as logically equivalent to workaround disappearing metadata [datafusion]

2024-09-25 Thread via GitHub
itsjunetime opened a new pull request, #12631: URL: https://github.com/apache/datafusion/pull/12631 ## Which issue does this PR close? This works around some problems noted in #12560. It is not a full fix, but just unblocks some other work for me. ## Rationale for this change

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776454559 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776419786 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776419786 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

[PR] modify register table functions args [datafusion]

2024-09-25 Thread via GitHub
JasonLi-cn opened a new pull request, #12630: URL: https://github.com/apache/datafusion/pull/12630 ## Which issue does this PR close? Closes #. ## Rationale for this change Changing `&str` into `impl Into` to be consistent with the `register_table` function.

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [I] Different semantics of casting from int64 to timestamp between Comet and Spark [datafusion-comet]

2024-09-25 Thread via GitHub
justahuman1 commented on issue #146: URL: https://github.com/apache/datafusion-comet/issues/146#issuecomment-2375538517 I can take this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-09-25 Thread via GitHub
justahuman1 commented on issue #325: URL: https://github.com/apache/datafusion-comet/issues/325#issuecomment-2375536003 Hi @sujithjay, let me know if you are still working on this. I am happy to continue #615 and add you as a co-author. Let me know, thank you -- This is an automated mess

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776115903 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

Re: [PR] Rename aggregation modules, GroupColumn [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 merged PR #12619: URL: https://github.com/apache/datafusion/pull/12619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Rename aggregation modules, GroupColumn [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on PR #12619: URL: https://github.com/apache/datafusion/pull/12619#issuecomment-2375455499 Since the change is not released, I think there is no API change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375450135 I wonder if we could change the "automatically load page index if needed" to "error if page index is needed but it is not loaded" 🤔 That might be a less surprising behavior -- This

[PR] feat: make register_csv accept a list of paths [datafusion-python]

2024-09-25 Thread via GitHub
mesejo opened a new pull request, #883: URL: https://github.com/apache/datafusion-python/pull/883 # Which issue does this PR close? N/A # Rationale for this change The method `read_csv` accepts a list of paths as input. It seems fitting that `register_csv` does the same. Moreove

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12593: URL: https://github.com/apache/datafusion/pull/12593#discussion_r1776029106 ## datafusion/core/src/datasource/physical_plan/parquet/reader.rs: ## @@ -57,9 +58,49 @@ pub trait ParquetFileReaderFactory: Debug + Send + Sync + 'static {

Re: [PR] chore: Make parquet reader options Comet options instead of Hadoop options [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #968: URL: https://github.com/apache/datafusion-comet/pull/968#discussion_r1776038555 ## common/src/main/java/org/apache/comet/parquet/ReadOptions.java: ## @@ -173,14 +147,24 @@ public Builder(Configuration conf) { this.conf = conf;

Re: [PR] chore: Make parquet reader options Comet options instead of Hadoop options [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #968: URL: https://github.com/apache/datafusion-comet/pull/968#discussion_r1776037634 ## common/src/test/java/org/apache/comet/parquet/TestFileReader.java: ## @@ -615,12 +617,12 @@ public void testColumnIndexReadWrite() throws Exception { @Te

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12332: URL: https://github.com/apache/datafusion/pull/12332#issuecomment-2375250760 I took the liberty of merging up from main to resolve a conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Automatically generate function documentation from comments / code [datafusion]

2024-09-25 Thread via GitHub
Omega359 commented on issue #12432: URL: https://github.com/apache/datafusion/issues/12432#issuecomment-2375243708 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2375243686 Marking as draft as I don't think this is waiting on review and I am trying to keep the review backlog under control -- This is an automated message from the Apache Git Service. To r

Re: [I] Improve performance of *TRIM functions for StringViewArray [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #12387: Improve performance of *TRIM functions for StringViewArray URL: https://github.com/apache/datafusion/issues/12387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12395: URL: https://github.com/apache/datafusion/pull/12395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2375239941 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375216464 To give an idea of the difference. Metrics with the new `load_metadata` cached: > time_elapsed_opening=272.811692949s, time_elapsed_processing=118.675115374s, time_elapsed_sc

Re: [I] Replace `OnceLock` with `LazyLock` [datafusion]

2024-09-25 Thread via GitHub
Omega359 commented on issue #11687: URL: https://github.com/apache/datafusion/issues/11687#issuecomment-2375213450 Didn't LazyLock become stable in 1.80.0 ? https://blog.rust-lang.org/2024/07/25/Rust-1.80.0.html I just tried to use it in main and it wouldn't compile... -- This is a

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375197430 I remembered. The issue is that in this code: https://github.com/apache/arrow-rs/blob/62825b27e98e6719cb66258535c75c7490ddba44/parquet/src/arrow/async_reader/mod.rs#L212-L228

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
andygrove commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2375151146 Would it make sense just to upgrade to DF 41 in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Minor: improve documentation to StringView trim [datafusion]

2024-09-25 Thread via GitHub
alamb opened a new pull request, #12629: URL: https://github.com/apache/datafusion/pull/12629 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/12395 from @Rachelint ## Rationale for this change Part of reviewing https://github

Re: [PR] chore: Update benchmarks results based on 0.3.0-rc1 [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove merged PR #969: URL: https://github.com/apache/datafusion-comet/pull/969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375108812 I agree, it's confusing. I actually started from [advanced_parquet_index.rs](https://github.com/apache/datafusion/blob/91c8a47f2001b29bb4fdf6f41e18a65bbf0752de/datafusion-examp

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2375106042 It is great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12595: URL: https://github.com/apache/datafusion/pull/12595#issuecomment-2375104664 Thank you for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12595: URL: https://github.com/apache/datafusion/pull/12595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12595: URL: https://github.com/apache/datafusion/pull/12595#issuecomment-2375104297 > lgtm thanks @alamb wondering why no tests are failing 🤔 Its because this error only happens due to bugs (like https://github.com/apache/datafusion/issues/12446) so in theory it

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1775902607 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::(fun

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1775797249 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -60,11 +61,13 @@ pub trait ArrayRowEq: Send + Sync { fn take_n(&mut self, n

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1775792378 ## datafusion/sqllogictest/test_files/string/string_view.slt: ## @@ -982,5 +982,93 @@ logical_plan 01)Projection: temp.column2 || temp.column3 02)--TableScan: temp

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1775793749 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -60,11 +61,13 @@ pub trait ArrayRowEq: Send + Sync { fn take_n(&mut self, n

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1775792354 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbrow

Re: [PR] Disallow duplicated qualified field names [datafusion]

2024-09-25 Thread via GitHub
comphead commented on PR #12608: URL: https://github.com/apache/datafusion/pull/12608#issuecomment-2374911993 we need probably to check how it works if provided the schema and catalog -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2374907418 I looked a bit at https://gitlab.softwareheritage.org/swh/devel/swh-provenance/-/merge_requests/182/diffs I wonder if you could cache the Arc rather than the `ArrowReaderMetadta

[PR] chore: Update for 0.3.0 release, prepare for 0.4.0 development [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove opened a new pull request, #970: URL: https://github.com/apache/datafusion-comet/pull/970 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
palaska commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2374878019 > > I've just looked at this, I think the never completing queries just try to return too many rows and they take too long. That is because the logical optimizer pushes down

[PR] chore: Update benchmarks results based on 0.3.0-rc1 [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove opened a new pull request, #969: URL: https://github.com/apache/datafusion-comet/pull/969 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Add unhandled hook to PruningPredicate [datafusion]

2024-09-25 Thread via GitHub
adriangb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374868022 > It might also be good to look at https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html#method.literal_guarantees which you might be

Re: [PR] Add unhandled hook to PruningPredicate [datafusion]

2024-09-25 Thread via GitHub
adriangb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374862591 > I don't fully understand what a_point_lookup_column @> '{abc}'::text[] Basically I want to take the predicate `a_point_lookup_column = 'abc'` and transform that into a filte

[PR] chore: Make parquet reader options Comet options instead of Hadoop options [datafusion-comet]

2024-09-25 Thread via GitHub
parthchandra opened a new pull request, #968: URL: https://github.com/apache/datafusion-comet/pull/968 ## Which issue does this PR close? Closes #876 . ## Rationale for this change Moves configuration options of the parallel reader into CometConf and adds documentation so th

Re: [PR] Add unhandled hook to PruningPredicate [datafusion]

2024-09-25 Thread via GitHub
adriangb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374853474 The issue is that PruningPredicate discards (by returning `true`) any predicates it doesn't itself know how to rewrite. So if I do the rewrite before calling PruningPredicate then t

Re: [PR] Implement GROUPING aggregate function (following Postgres behavior.) [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12565: URL: https://github.com/apache/datafusion/pull/12565#discussion_r1775747830 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -578,7 +579,13 @@ impl GroupedHashAggregateStream { /// [`GroupsAccumulatorAdapter`] if not. pub

Re: [PR] Add unhandled hook to PruningPredicate [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374842450 It might also be good to look at https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html#method.literal_guarantees which you might be able

Re: [PR] Add unhandled hook to PruningPredicate [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374840656 > I now want to add an index for point lookups (I plan on implementing it as a column with distinct array values, but that's a bit of an implementation detail). > The point is t

Re: [PR] implement kurtosis udaf [datafusion]

2024-09-25 Thread via GitHub
jatin510 commented on PR #12613: URL: https://github.com/apache/datafusion/pull/12613#issuecomment-2374836764 sure, I will make the changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Require `Debug` for `PhysicalOptimizerRule` [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12624: URL: https://github.com/apache/datafusion/pull/12624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Require `Debug` for `PhysicalOptimizerRule` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12624: URL: https://github.com/apache/datafusion/pull/12624#issuecomment-2374830778 I think this one is minor enough and non controversial I will merge it in, even though it is an API change -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Update scalar_functions.md [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12627: URL: https://github.com/apache/datafusion/pull/12627#issuecomment-2374828911 Thanks again @Abdullahsab3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12571: URL: https://github.com/apache/datafusion/pull/12571#discussion_r1775739801 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -137,6 +141,10 @@ pub struct PhysicalGroupBy { /// expression in null_expr. If `groups[i][j]` is t

Re: [I] Is it possible to support PyArrow backed UDFs in Comet natively? [datafusion-comet]

2024-09-25 Thread via GitHub
SemyonSinchenko commented on issue #957: URL: https://github.com/apache/datafusion-comet/issues/957#issuecomment-2374828920 > Your high level plan sounds good to me. Cool, thank you! I will start working on it. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Update scalar_functions.md [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12627: URL: https://github.com/apache/datafusion/pull/12627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12571: URL: https://github.com/apache/datafusion/pull/12571#discussion_r1775739407 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -210,13 +221,114 @@ impl PhysicalGroupBy { .collect() } +/// The number of expr

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12620: URL: https://github.com/apache/datafusion/pull/12620#discussion_r1775731789 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -78,6 +79,38 @@ impl GroupValuesColumn { random_state: Default::default(),

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12571: URL: https://github.com/apache/datafusion/pull/12571#discussion_r1775707690 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -3512,6 +3512,18 @@ SELECT MIN(value), MAX(value) FROM integers_with_nulls 1 5 +# grouping_s

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12571: URL: https://github.com/apache/datafusion/pull/12571#discussion_r1775730463 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -108,6 +110,8 @@ impl AggregateMode { } } +const INTERNAL_GROUPING_ID: &str = "grouping_id"; R

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
rluvaton commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2374805365 I see, thank you, how is your experience with debugging this kind of test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] I'm tempted to add this image to the readme 😅 [datafusion]

2024-09-25 Thread via GitHub
rluvaton opened a new issue, #12628: URL: https://github.com/apache/datafusion/issues/12628 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

[I] Benchmarking Configurations for SparkSubmit [datafusion-benchmarks]

2024-09-25 Thread via GitHub
radhikabajaj123 opened a new issue, #14: URL: https://github.com/apache/datafusion-benchmarks/issues/14 Hello, I am using the following configurations for benchmarking: `--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --deplo

Re: [I] Implement Common Subexpression Elimination optimizer rule [datafusion-comet]

2024-09-25 Thread via GitHub
eejbyfeldt commented on issue #942: URL: https://github.com/apache/datafusion-comet/issues/942#issuecomment-2374777297 Spark has it, but not at the plan level. Instead they do it as part of their code generation: https://github.com/apache/spark/blob/v3.5.3/sql/catalyst/src/main/scala/org/a

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2374680096 @parthchandra I know that you are working on this so I tried assigning the issue to you, but your name does not show up for me for some reason. I wondered if pinging you he

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2024-09-25 Thread via GitHub
parthchandra commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2374753686 Posting a reply in case it helps associate the issue somehow. Anyhow, confirming that I am indeed working on this. In that context, I am initially planning to only ad

Re: [I] suggest add synax `select from generate_series()` [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2374742174 That might work, but I wonder how it will be inferred and whether it will cause any conflicts. Some examples that can be ambiguous on top of my head: ```sql select ge

Re: [PR] Added array_any_value function [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on code in PR #12329: URL: https://github.com/apache/datafusion/pull/12329#discussion_r1775650911 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -2131,6 +2132,7 @@ to_unixtime(expression[, ..., format_n]) - [empty](#empty) - [flatten](#flatten)

[PR] Update scalar_functions.md [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 opened a new pull request, #12627: URL: https://github.com/apache/datafusion/pull/12627 Remove extra space in reference to list_any_value ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are incl

Re: [I] avro_to_arrow: Support in memory apache_avro Value's [datafusion]

2024-09-25 Thread via GitHub
alamb commented on issue #7690: URL: https://github.com/apache/datafusion/issues/7690#issuecomment-2374710722 > To solve this, I would like to split the AvroArrowArrayReader into a reader and a convertor(Vec to RecordBatch). This sounds like a very reasonable idea to me. I thin

Re: [PR] Add Dictionary String (UTF8) type to String sqllogictests [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12621: URL: https://github.com/apache/datafusion/pull/12621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add Dictionary String (UTF8) type to String sqllogictests [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12621: URL: https://github.com/apache/datafusion/pull/12621#issuecomment-2374666023 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Remove the unsafe codes in `trim` by using `Pattern` [datafusion]

2024-09-25 Thread via GitHub
Rachelint closed issue #12597: Remove the unsafe codes in `trim` by using `Pattern` URL: https://github.com/apache/datafusion/issues/12597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Remove the unsafe codes in `trim` by using `Pattern` [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on issue #12597: URL: https://github.com/apache/datafusion/issues/12597#issuecomment-2374636629 I found we can get the index in safe way. This issue can be closed. https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715f5052ccfa269f0493/datafusio

Re: [I] Improve performance of *TRIM functions for StringViewArray [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on issue #12387: URL: https://github.com/apache/datafusion/issues/12387#issuecomment-2374635119 Finally, after reminding from @alamb , I found we can get the index in safe way, detail can see: https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715

Re: [I] Is it possible to support PyArrow backed UDFs in Comet natively? [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #957: URL: https://github.com/apache/datafusion-comet/issues/957#issuecomment-2374623643 > It seems to me that re-using how spark handle python UDFs would be easier than implementing it from scratch using datafusion. But I'm not 100% sure. Yes, that is th

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2374621671 @alamb I found we indeed don't need the unsafe pointer arithmetic to get the `start_offset`, and I have swithed to a safe way here. Thanks much for suggestion! -- This is an aut

Re: [I] Add Scala & Python support to benchmarking [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #967: URL: https://github.com/apache/datafusion-comet/issues/967#issuecomment-2374618914 Thanks @holdenk. That would be great. Since you mentioned Python, you may be interested in https://github.com/apache/datafusion-comet/issues/957 as well. -- This is an au

Re: [PR] Disallow duplicated qualified field names [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12608: URL: https://github.com/apache/datafusion/pull/12608#discussion_r1775609041 ## datafusion/expr/src/utils.rs: ## @@ -260,7 +252,11 @@ pub fn grouping_set_to_exprlist(group_expr: &[Expr]) -> Result> { } Ok(grouping_set.di

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12620: URL: https://github.com/apache/datafusion/pull/12620#discussion_r1775604205 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -78,6 +79,38 @@ impl GroupValuesColumn { random_state: Default::default

Re: [PR] Require `Debug` for `PhysicalOptimizerRule` [datafusion]

2024-09-25 Thread via GitHub
AnthonyZhOon commented on PR #12624: URL: https://github.com/apache/datafusion/pull/12624#issuecomment-2374597934 Okay, clippy lint should be fixed with the revert @alamb , odd that rust-analyzer misled me on the types at the time maybe it was an incomplete compile -- This is an automate

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12620: URL: https://github.com/apache/datafusion/pull/12620#discussion_r1775591280 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -78,6 +79,38 @@ impl GroupValuesColumn { random_state: Default::default

Re: [PR] Require `Debug` for `TableProvider`, `TableProviderFactory` and `PartitionStream` [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12557: URL: https://github.com/apache/datafusion/pull/12557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Add Scala & Python support to benchmarking [datafusion-comet]

2024-09-25 Thread via GitHub
holdenk opened a new issue, #967: URL: https://github.com/apache/datafusion-comet/issues/967 ### What is the problem the feature request solves? For folks considering adopting Arrow Datafusion Comet who have Dataset code in Scala & Python it would probably be useful to be able to run

Re: [PR] Require `Debug` for `TableProvider`, `TableProviderFactory` and `PartitionStream` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12557: URL: https://github.com/apache/datafusion/pull/12557#issuecomment-2374596246 Thank you @timsaucer 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2374592664 I think the tests I was talking about are described here: https://github.com/apache/datafusion/tree/4a3c09a9316fb8940aeb1c5b9b48bc3b7259d5d4/datafusion/sqllogictest#readme

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
andygrove commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2374588304 > I've just looked at this, I think the never completing queries just try to return too many rows and they take too long. That is because the logical optimizer pushes down

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-25 Thread via GitHub
andygrove commented on PR #12626: URL: https://github.com/apache/datafusion/pull/12626#issuecomment-2374580198 Also, could you add/update a unit test so that we prevent future regressions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #12626: URL: https://github.com/apache/datafusion/pull/12626#discussion_r1775552730 ## datafusion/proto/src/logical_plan/mod.rs: ## @@ -490,7 +490,14 @@ impl AsLogicalPlan for LogicalPlanNode { into_logical_plan!(sort.input,

Re: [I] suggest add synax `select from generate_series()` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2374529536 FWIW I think we should consider keeping the existing `generate_series` function and add a new user defined table function with the same name -- This is an automated message from

  1   2   >