Re: [PR] doc: Update supported expressions [arrow-datafusion-comet]

2024-03-30 Thread via GitHub
viirya commented on PR #237: URL: https://github.com/apache/arrow-datafusion-comet/pull/237#issuecomment-2028578097 I created the umbrella ticket https://github.com/apache/arrow-datafusion-comet/issues/240 to cover these unsupported expressions. I will add some expressions there so the peo

Re: [PR] doc: Update supported expressions [arrow-datafusion-comet]

2024-03-30 Thread via GitHub
viirya commented on PR #237: URL: https://github.com/apache/arrow-datafusion-comet/pull/237#issuecomment-2028577300 > Is there a document listing unsupported expressions that individuals could potentially work on if they're interested? Thank you. We don't have such document. Spark ex

Re: [PR] Propagate error from filter closure in `filter_leaves`. [arrow-rs]

2024-03-30 Thread via GitHub
viirya commented on PR #5575: URL: https://github.com/apache/arrow-rs/pull/5575#issuecomment-2028576968 > Perhaps we could add a try_filter_leaves to avoid this being a breaking change, filter_leaves could just call this with unwrap Sounds good to me. Let me update this later. -- T

Re: [PR] doc: Update supported expressions [arrow-datafusion-comet]

2024-03-30 Thread via GitHub
dbtsai commented on PR #237: URL: https://github.com/apache/arrow-datafusion-comet/pull/237#issuecomment-2028573650 Is there a document listing unsupported expressions that individuals could potentially work on if they're interested? Thank you. -- This is an automated message from the Ap

Re: [PR] Propagate error from filter closure in `filter_leaves`. [arrow-rs]

2024-03-30 Thread via GitHub
tustvold commented on PR #5575: URL: https://github.com/apache/arrow-rs/pull/5575#issuecomment-2028571727 Perhaps we could add a try_filter_leaves to avoid this being a breaking change, filter_leaves could just call this with unwrap -- This is an automated message from the Apache Git Serv

[PR] Add `ExternalSorter`' s `spilled_rows` metric [arrow-datafusion]

2024-03-30 Thread via GitHub
erenavsarogullari opened a new pull request, #9885: URL: https://github.com/apache/arrow-datafusion/pull/9885 ## Which issue does this PR close? Closes #9884. ## What changes are included in this PR? This PR introduces following changes: 1- Currently, `ExternalSorter` exposes

[PR] Propagate error from filter closure in `filter_leaves`. [arrow-rs]

2024-03-30 Thread via GitHub
viirya opened a new pull request, #5575: URL: https://github.com/apache/arrow-rs/pull/5575 # Which issue does this PR close? Closes #5574. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

Re: [I] Expose `ExternalSorter`' s `spilled_rows` metric [arrow-datafusion]

2024-03-30 Thread via GitHub
erenavsarogullari commented on issue #9884: URL: https://github.com/apache/arrow-datafusion/issues/9884#issuecomment-2028565495 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Expose `ExternalSorter`' s `spilled_rows` metric [arrow-datafusion]

2024-03-30 Thread via GitHub
erenavsarogullari opened a new issue, #9884: URL: https://github.com/apache/arrow-datafusion/issues/9884 ### Is your feature request related to a problem or challenge? Currently, `ExternalSorter` exposes `BaselineMetrics`(e.g: total output rows), `spill_count` and `spilled_bytes`. `sp

[PR] minor(doc): fix dead link for catalogs example [arrow-datafusion]

2024-03-30 Thread via GitHub
yjshen opened a new pull request, #9883: URL: https://github.com/apache/arrow-datafusion/pull/9883 ## Which issue does this PR close? Closes #. ## Rationale for this change Fix the doc link since the catalog example has been moved. I also checked other links pointi

Re: [I] [R] Can't install `adbcflightsql` from CRAN [arrow-adbc]

2024-03-30 Thread via GitHub
eitsupi commented on issue #1647: URL: https://github.com/apache/arrow-adbc/issues/1647#issuecomment-2028560038 I think the `adbcflightsql` package is not released on CRAN. You may install from GitHub. -- This is an automated message from the Apache Git Service. To respond to the messag

[PR] Move `Radians`, `Signum`, `Sin`, `Sinh`, `Sqrt` functions to `datafusion-functions` crate [arrow-datafusion]

2024-03-30 Thread via GitHub
erenavsarogullari opened a new pull request, #9882: URL: https://github.com/apache/arrow-datafusion/pull/9882 ## Which issue does this PR close? Closes #9860. ## What changes are included in this PR? `radians`, `signum`, `sin`, `sinh` and `sqrt` functions are moved to `datafusio

[I] Make `filter` in `filter_leaves` API propagate error [arrow-rs]

2024-03-30 Thread via GitHub
viirya opened a new issue, #5574: URL: https://github.com/apache/arrow-rs/issues/5574 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently `filter_leaves` API takes a closure which returns a `bool`. In some cases, e.g

Re: [PR] feat: Add display_pg_json for LogicalPlan [arrow-datafusion]

2024-03-30 Thread via GitHub
liurenjie1024 commented on code in PR #9789: URL: https://github.com/apache/arrow-datafusion/pull/9789#discussion_r1545533951 ## datafusion/expr/Cargo.toml: ## @@ -43,6 +43,7 @@ arrow-array = { workspace = true } chrono = { workspace = true } datafusion-common = { workspace =

[I] ci: Investigate running `cargo udeps` in ci. [arrow-datafusion]

2024-03-30 Thread via GitHub
liurenjie1024 opened a new issue, #9881: URL: https://github.com/apache/arrow-datafusion/issues/9881 ### Is your feature request related to a problem or challenge? See [discussion](https://github.com/apache/arrow-datafusion/pull/9789#discussion_r1543435747) , we are expecting to remo

Re: [I] [Python] bool followed by float disallowed, but float followed by bool allowed [arrow]

2024-03-30 Thread via GitHub
Wainberg commented on issue #40909: URL: https://github.com/apache/arrow/issues/40909#issuecomment-2028539713 A fifth example with `datetime.date` and `int`: ```python >>> pa.array(np.array([datetime.date(2021, 2, 3), 1], dtype=object)) [ 2021-02-03, 1970-01-02 ]

Re: [PR] doc: Update supported expressions [arrow-datafusion-comet]

2024-03-30 Thread via GitHub
viirya commented on PR #237: URL: https://github.com/apache/arrow-datafusion-comet/pull/237#issuecomment-2028530016 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] initial version of filter for run end array with i64 run_ends [arrow-rs]

2024-03-30 Thread via GitHub
Jefffrey commented on code in PR #5573: URL: https://github.com/apache/arrow-rs/pull/5573#discussion_r1545521930 ## arrow-select/src/filter.rs: ## @@ -844,6 +888,72 @@ mod tests { assert_eq!(9, d.value(1)); } +#[test] +fn test_filter_run_end_encoding_arra

[PR] feat: Change default value of columnar shuffle config [arrow-datafusion-comet]

2024-03-30 Thread via GitHub
viirya opened a new pull request, #239: URL: https://github.com/apache/arrow-datafusion-comet/pull/239 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these change

Re: [PR] Make FirstValue UDAF based function [arrow-datafusion]

2024-03-30 Thread via GitHub
jayzhan211 commented on code in PR #9874: URL: https://github.com/apache/arrow-datafusion/pull/9874#discussion_r1545522347 ## datafusion/expr/src/function.rs: ## @@ -38,9 +39,20 @@ pub type ReturnTypeFunction = Arc Result> + Send + Sync>; /// Factory that returns an accu

Re: [PR] feat: Determine ordering of file groups [arrow-datafusion]

2024-03-30 Thread via GitHub
suremarc commented on code in PR #9593: URL: https://github.com/apache/arrow-datafusion/pull/9593#discussion_r1545518024 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -194,6 +203,71 @@ impl FileScanConfig { .with_repartition_file_min_siz

Re: [PR] feat: Determine ordering of file groups [arrow-datafusion]

2024-03-30 Thread via GitHub
suremarc commented on code in PR #9593: URL: https://github.com/apache/arrow-datafusion/pull/9593#discussion_r1545517463 ## datafusion/sqllogictest/test_files/parquet.slt: ## @@ -169,6 +171,38 @@ SELECT min(date_col) FROM test_table; 1970-01-02 +# Clean up +statement ok

[I] Macro based scalar function definitions make IDE harder to trace function usage [arrow-datafusion]

2024-03-30 Thread via GitHub
viirya opened a new issue, #9880: URL: https://github.com/apache/arrow-datafusion/issues/9880 ### Is your feature request related to a problem or challenge? I'm revamping the planner in Comet for some recent changes happened in DataFusion, especially for scalar functions. I fou

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
andygrove commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545514735 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result;

Re: [I] instr function doesn't correctly account for non-ascii strings [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 closed issue #9026: instr function doesn't correctly account for non-ascii strings URL: https://github.com/apache/arrow-datafusion/issues/9026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] instr function doesn't correctly account for non-ascii strings [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 commented on issue #9026: URL: https://github.com/apache/arrow-datafusion/issues/9026#issuecomment-2028496373 Closing this issue as the function was removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat: Determine ordering of file groups [arrow-datafusion]

2024-03-30 Thread via GitHub
suremarc commented on code in PR #9593: URL: https://github.com/apache/arrow-datafusion/pull/9593#discussion_r1545511270 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -762,6 +836,171 @@ mod tests { assert_eq!(projection.fields(), schema.fiel

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
viirya commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545511225 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result; }

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
viirya commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545510509 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result; }

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
viirya commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545510509 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result; }

Re: [I] Bloom filters for i8 and i16 always return false negatives [arrow-rs]

2024-03-30 Thread via GitHub
mr-brobot commented on issue #5550: URL: https://github.com/apache/arrow-rs/issues/5550#issuecomment-2028482716 ## Overview The general issue is that Parquet types are a subset of Arrow types, so the Arrow writer must coerce to Parquet types. In some cases, this changes the physical

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
viirya commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545503823 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result; }

Re: [PR] move Log2, Log10, Ln to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on code in PR #9869: URL: https://github.com/apache/arrow-datafusion/pull/9869#discussion_r1545495710 ## datafusion/functions/src/math/mod.rs: ## @@ -24,10 +24,14 @@ mod nans; make_udf_function!(nans::IsNanFunc, ISNAN, isnan); make_udf_function!(abs::AbsFunc, A

[I] Request: Improve Monotoniciy API [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb opened a new issue, #9879: URL: https://github.com/apache/arrow-datafusion/issues/9879 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/arrow-datafusion/pull/9869 from @tinfoil-knight I was confused about the [`ScalarUDFIm

[PR] Add benchmark for substr_index [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 opened a new pull request, #9878: URL: https://github.com/apache/arrow-datafusion/pull/9878 ## Which issue does this PR close? Closes #9877 ## Rationale for this change substr_index function may havae some optimization potential so adding a benchmark to

Re: [I] Implement Run Length Encoding (RLE) / Run End Encoding (REE) support (Epic) [arrow-rs]

2024-03-30 Thread via GitHub
alamb commented on issue #3520: URL: https://github.com/apache/arrow-rs/issues/3520#issuecomment-2028464433 > Hi, I'm working on [raphtory](https://www.raphtory.com/#) trying to get a query engine off the ground with datafusion. One of the key ingredients would be REE array support, because

Re: [PR] Introduce `TreeNodeMutator` for rewriting `TreeNode`s in place, change optimizer to rewrite `LogicalPlan` in place [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9780: URL: https://github.com/apache/arrow-datafusion/pull/9780#issuecomment-2028463475 Updated description with performance results (about a 10% improvement for this PR alone). I think we can squeeze a bunch more out of this approach with the additional changes to r

Re: [I] [BUG] Panic when querying table with wrong partition columns order [arrow-datafusion]

2024-03-30 Thread via GitHub
MohamedAbdeen21 commented on issue #9785: URL: https://github.com/apache/arrow-datafusion/issues/9785#issuecomment-2028462866 Sorry for the late response. > We could potentially return an error earlier by checking the schema of any file present during the create external table execut

Re: [PR] Add support for Bloom filters on unsigned integer columns in Parquet tables [arrow-datafusion]

2024-03-30 Thread via GitHub
progval commented on PR #9770: URL: https://github.com/apache/arrow-datafusion/pull/9770#issuecomment-2028461923 > as maybe you wrote it to add tests for this one 🤔 yes that's what I had in mind -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] feat: Determine ordering of file groups [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9593: URL: https://github.com/apache/arrow-datafusion/pull/9593#issuecomment-2028461811 I think this PR is quite nice and it would be great if we could get the tests written and the code polished up. Marking as draft as I think this PR is no longer waiting on

Re: [PR] parquet: Add tests for pruning on Int8/Int16/Int64 columns [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9778: URL: https://github.com/apache/arrow-datafusion/pull/9778#issuecomment-2028461538 I merged up from main to make sure we have a clean CI run and then I think we can merge this one in -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Add support for Bloom filters on unsigned integer columns in Parquet tables [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9770: URL: https://github.com/apache/arrow-datafusion/pull/9770#issuecomment-2028460541 > Thank you very much @progval 🙏 > > Can we please add tests for this code (so that we don't accidentally break it in some future refactoring?) Hi @progval -- do you

Re: [PR] feat: explicit implementation for union's required_input_ordering [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9867: URL: https://github.com/apache/arrow-datafusion/pull/9867#issuecomment-2028460222 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from

Re: [PR] Minor: Move depcheck out of datafusion crate (200 less crates to compile) [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9865: URL: https://github.com/apache/arrow-datafusion/pull/9865#issuecomment-2028459863 > lgtm thanks @alamb, macos timing is good but we probably need to investigate why compile time is so slow on windows machines I agree the windows test takes too long.

Re: [I] Minor: remove extra 'regexp_match_1000' test from regx.rs benchmark [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb closed issue #9863: Minor: remove extra 'regexp_match_1000' test from regx.rs benchmark URL: https://github.com/apache/arrow-datafusion/issues/9863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Minor: delete duplicate bench test [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9866: URL: https://github.com/apache/arrow-datafusion/pull/9866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Move depcheck out of datafusion crate (200 less crates to compile) [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9865: URL: https://github.com/apache/arrow-datafusion/pull/9865#issuecomment-2028459925 Merging this in to improve the Dev experience. Thanks @comphead for the review -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Minor: Move depcheck out of datafusion crate (200 less crates to compile) [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9865: URL: https://github.com/apache/arrow-datafusion/pull/9865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] [DISCUSSION] Sort Merge Join Experimental status [arrow-datafusion]

2024-03-30 Thread via GitHub
metesynnada commented on issue #9846: URL: https://github.com/apache/arrow-datafusion/issues/9846#issuecomment-2028459854 I believe we can add fuzz tests for SMJ to ensure it is robust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Only Allow Declaring Partition Columns in `PARTITIONED BY` Clause [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb closed issue #9465: Only Allow Declaring Partition Columns in `PARTITIONED BY` Clause URL: https://github.com/apache/arrow-datafusion/issues/9465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Allow declaring partition columns in `PARTITION BY` clause, backwards compatible [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9599: URL: https://github.com/apache/arrow-datafusion/pull/9599#issuecomment-2028459271 Thanks again @MohamedAbdeen21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Allow declaring partition columns in `PARTITION BY` clause, backwards compatible [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9599: URL: https://github.com/apache/arrow-datafusion/pull/9599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
andygrove commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545490632 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result;

Re: [PR] move `Atan`, `Acosh`, `Asinh`, `Atanh` to `datafusion-function` [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9872: URL: https://github.com/apache/arrow-datafusion/pull/9872#issuecomment-2028459104 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from

Re: [PR] GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface [arrow]

2024-03-30 Thread via GitHub
andygrove commented on code in PR #40043: URL: https://github.com/apache/arrow/pull/40043#discussion_r1545490632 ## java/vector/src/main/java/org/apache/arrow/vector/BaseLargeVariableWidthVector.java: ## @@ -336,6 +336,31 @@ public List getFieldBuffers() { return result;

Re: [I] Remove `struct` UDF, and use `named_struct` everywhere [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on issue #9839: URL: https://github.com/apache/arrow-datafusion/issues/9839#issuecomment-2028458872 I think this is a good first issue as it is well specified, and there are patterns to follow -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Support custom struct field names with new scalar function named_struct [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9743: URL: https://github.com/apache/arrow-datafusion/pull/9743#issuecomment-2028458795 And thanks for the reviews and suggestions @yyy1000 and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Support custom struct field names with new scalar function named_struct [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9743: URL: https://github.com/apache/arrow-datafusion/pull/9743#issuecomment-2028458689 Thanks again @gstvg 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Create a struct with the specified field names and values [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb closed issue #5861: Create a struct with the specified field names and values URL: https://github.com/apache/arrow-datafusion/issues/5861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Support custom struct field names with new scalar function named_struct [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9743: URL: https://github.com/apache/arrow-datafusion/pull/9743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] move the Translate, SubstrIndex, FindInSet functions to new datafusion-functions crate [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb closed issue #9842: move the Translate, SubstrIndex, FindInSet functions to new datafusion-functions crate URL: https://github.com/apache/arrow-datafusion/issues/9842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] move the Translate, SubstrIndex, FindInSet functions to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9864: URL: https://github.com/apache/arrow-datafusion/pull/9864#issuecomment-2028458293 Thanks again @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] move the Translate, SubstrIndex, FindInSet functions to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9864: URL: https://github.com/apache/arrow-datafusion/pull/9864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] Add benchmark for substr_index [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 commented on issue #9877: URL: https://github.com/apache/arrow-datafusion/issues/9877#issuecomment-2028457491 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Add benchmark for substr_index [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 opened a new issue, #9877: URL: https://github.com/apache/arrow-datafusion/issues/9877 ### Is your feature request related to a problem or challenge? As noted in https://github.com/apache/arrow-datafusion/pull/9864#discussion_r1545353414 there might be some optimization avai

Re: [PR] fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9871: URL: https://github.com/apache/arrow-datafusion/pull/9871#issuecomment-2028456284 Given this is a regression I will plan to merge this PR tomorrow unless someone else would like time to review -- This is an automated message from the Apache Git Service. To re

[PR] Simplify Expr::map_children [arrow-datafusion]

2024-03-30 Thread via GitHub
peter-toth opened a new pull request, #9876: URL: https://github.com/apache/arrow-datafusion/pull/9876 ## Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/9457 ## Rationale for this change The current implementation of `Expr::map_ch

Re: [PR] chore(r): Add DBItest suite to CI via adbi [arrow-adbc]

2024-03-30 Thread via GitHub
krlmlr commented on PR #1401: URL: https://github.com/apache/arrow-adbc/pull/1401#issuecomment-2028455394 - Added https://github.com/r-dbi/DBItest/pull/363 to deal with adbcsqlite apparently issuing multiple warnings instead of just one - Added https://github.com/r-dbi/adbi/pull/25 to fix

Re: [I] move Floor, Gcd, Lcm, Pi, Power to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
tinfoil-knight commented on issue #9861: URL: https://github.com/apache/arrow-datafusion/issues/9861#issuecomment-2028452242 @Omega359 I just need the `Power` function since it's dependent on `Log`. I've opened a new issue: #9875 for it. Please update this issue to not include the `P

[I] move the Log, Power functions to datafusion-functions crate [arrow-datafusion]

2024-03-30 Thread via GitHub
tinfoil-knight opened a new issue, #9875: URL: https://github.com/apache/arrow-datafusion/issues/9875 ### Is your feature request related to a problem or challenge? As part of #9285, move the Log, Power functions to datafusion-functions crate ### Describe the solution you'd like

[PR] build(deps): bump tokio from 1.36.0 to 1.37.0 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #623: URL: https://github.com/apache/arrow-datafusion-python/pull/623 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.36.0 to 1.37.0. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. T

[PR] build(deps): bump regex-syntax from 0.8.2 to 0.8.3 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #622: URL: https://github.com/apache/arrow-datafusion-python/pull/622 Bumps [regex-syntax](https://github.com/rust-lang/regex) from 0.8.2 to 0.8.3. Commits https://github.com/rust-lang/regex/commit/d895bd984537538240e175cc55bc0103072104

Re: [PR] build(deps): bump async-trait from 0.1.77 to 0.1.78 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] commented on PR #616: URL: https://github.com/apache/arrow-datafusion-python/pull/616#issuecomment-2028449039 Superseded by #621. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] build(deps): bump async-trait from 0.1.77 to 0.1.78 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] closed pull request #616: build(deps): bump async-trait from 0.1.77 to 0.1.78 URL: https://github.com/apache/arrow-datafusion-python/pull/616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] build(deps): bump async-trait from 0.1.77 to 0.1.79 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #621: URL: https://github.com/apache/arrow-datafusion-python/pull/621 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.77 to 0.1.79. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-tra

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.53 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] closed pull request #614: build(deps): bump syn from 2.0.48 to 2.0.53 URL: https://github.com/apache/arrow-datafusion-python/pull/614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.53 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] commented on PR #614: URL: https://github.com/apache/arrow-datafusion-python/pull/614#issuecomment-2028448988 Superseded by #620. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] build(deps): bump syn from 2.0.48 to 2.0.57 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #620: URL: https://github.com/apache/arrow-datafusion-python/pull/620 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.48 to 2.0.57. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.57

Re: [PR] move Log2, Log10, Ln to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
tinfoil-knight commented on PR #9869: URL: https://github.com/apache/arrow-datafusion/pull/9869#issuecomment-2028449030 > Note I think we can implement `simpl_log` using `ScalarUDFImpl::simplify` Yeah. I'll try this out. -- This is an automated message from the Apache Gi

[PR] build(deps): bump pyo3-build-config from 0.20.2 to 0.21.0 [arrow-datafusion-python]

2024-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #619: URL: https://github.com/apache/arrow-datafusion-python/pull/619 Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.20.2 to 0.21.0. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3-build-config's re

Re: [PR] move Log2, Log10, Ln to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
tinfoil-knight commented on code in PR #9869: URL: https://github.com/apache/arrow-datafusion/pull/9869#discussion_r1545482647 ## datafusion/functions/src/math/mod.rs: ## @@ -24,10 +24,14 @@ mod nans; make_udf_function!(nans::IsNanFunc, ISNAN, isnan); make_udf_function!(abs::A

Re: [PR] CrossJoin Refactor [arrow-datafusion]

2024-03-30 Thread via GitHub
korowa commented on PR #9830: URL: https://github.com/apache/arrow-datafusion/pull/9830#issuecomment-2028430188 And one more thing to consider (and this is the second concern) -- if LeftData will be relatively small (it's natural behaviour, due to the fact that CrossJoin supports input reo

[PR] initial version of filter for run end array with i64 run_ends [arrow-rs]

2024-03-30 Thread via GitHub
fabianmurariu opened a new pull request, #5573: URL: https://github.com/apache/arrow-rs/pull/5573 # Which issue does this PR close? Related to #3520 # Rationale for this change Attempt at adding filter support for RunArray for i64 run_ends # What changes are incl

Re: [PR] CrossJoin Refactor [arrow-datafusion]

2024-03-30 Thread via GitHub
korowa commented on code in PR #9830: URL: https://github.com/apache/arrow-datafusion/pull/9830#discussion_r1545459833 ## datafusion/physical-plan/src/joins/cross_join.rs: ## @@ -374,64 +376,147 @@ impl Stream for CrossJoinStream { } impl CrossJoinStream { -/// Separate

Re: [PR] GH-40896: Remove runtime dependencies on Eclipse, logback [arrow]

2024-03-30 Thread via GitHub
lidavidm commented on PR #40904: URL: https://github.com/apache/arrow/pull/40904#issuecomment-2028293998 There is still some time for feature freeze so I'll leave this for a committer to look at. I won't be able to make updates in the next couple days but should be back online by 4/2. --

Re: [PR] Add CI compile checks for feature flags in datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
waynexia commented on PR #9772: URL: https://github.com/apache/arrow-datafusion/pull/9772#issuecomment-2028275418 Looking good 👍 Let me merge it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add CI compile checks for feature flags in datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
waynexia merged PR #9772: URL: https://github.com/apache/arrow-datafusion/pull/9772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

Re: [PR] GH-40893: [Java][FlightRPC] Support IntervalMonthDayNanoVector in FlightSQL JDBC Driver [arrow]

2024-03-30 Thread via GitHub
pgwhalen commented on PR #40894: URL: https://github.com/apache/arrow/pull/40894#issuecomment-2028256982 So what I'm going for: - `PeriodDuration` is still the object returned by `getObject()` to JDBC clients (matching the return type on the underlying vector) - `PeriodDuration` shou

Re: [I] [Python] bool followed by float disallowed, but float followed by bool allowed [arrow]

2024-03-30 Thread via GitHub
Wainberg commented on issue #40909: URL: https://github.com/apache/arrow/issues/40909#issuecomment-2028118026 A fourth example with `pd.Timestamp` and `int`: ```python >>> pa.array([1, pd.Timestamp('2024-01-01')]) Traceback (most recent call last): File "", line 1, in

[PR] Make FirstValue UDAF based function [arrow-datafusion]

2024-03-30 Thread via GitHub
jayzhan211 opened a new pull request, #9874: URL: https://github.com/apache/arrow-datafusion/pull/9874 ## Which issue does this PR close? First step of #8708 ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [I] Implement Run Length Encoding (RLE) / Run End Encoding (REE) support (Epic) [arrow-rs]

2024-03-30 Thread via GitHub
fabianmurariu commented on issue #3520: URL: https://github.com/apache/arrow-rs/issues/3520#issuecomment-2028077680 Hi, I'm working on [raphtory](https://www.raphtory.com/#) trying to get a query engine off the ground with datafusion. One of the key ingredients would be REE array support, b

Re: [PR] Support ORDER BY in AggregateUDF [arrow-datafusion]

2024-03-30 Thread via GitHub
jayzhan211 commented on PR #9249: URL: https://github.com/apache/arrow-datafusion/pull/9249#issuecomment-2028077162 I plan to move `first_value`, the current implementation doesn't seem correct. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] move the Translate, SubstrIndex, FindInSet functions to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
Omega359 commented on code in PR #9864: URL: https://github.com/apache/arrow-datafusion/pull/9864#discussion_r1545359722 ## datafusion/functions/src/unicode/substrindex.rs: ## @@ -0,0 +1,138 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on code in PR #9871: URL: https://github.com/apache/arrow-datafusion/pull/9871#discussion_r154547 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -35,37 +36,75 @@ use datafusion_expr::expr::Alias; use datafusion_expr::logical_plan::{Aggregate,

Re: [PR] move the Translate, SubstrIndex, FindInSet functions to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on code in PR #9864: URL: https://github.com/apache/arrow-datafusion/pull/9864#discussion_r1545352073 ## datafusion/core/Cargo.toml: ## @@ -67,8 +67,6 @@ regex_expressions = [ ] serde = ["arrow-schema/serde"] unicode_expressions = [ -"datafusion-physical-e

Re: [PR] move `Atan`, `Acosh`, `Asinh`, `Atanh` to `datafusion-function` [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on code in PR #9872: URL: https://github.com/apache/arrow-datafusion/pull/9872#discussion_r1545350387 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -527,9 +527,10 @@ Sort: atan_c11 ASC NULLS LAST TableScan: aggregate_test_100 projection=[c11] physi

Re: [PR] move Log2, Log10, Ln to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on code in PR #9869: URL: https://github.com/apache/arrow-datafusion/pull/9869#discussion_r1545348286 ## datafusion/functions/src/math/mod.rs: ## @@ -24,10 +24,14 @@ mod nans; make_udf_function!(nans::IsNanFunc, ISNAN, isnan); make_udf_function!(abs::AbsFunc, A

Re: [PR] move Log2, Log10, Ln to datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb merged PR #9869: URL: https://github.com/apache/arrow-datafusion/pull/9869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] move the Log2, Log10, Ln functions to datafusion-functions crate [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb closed issue #9868: move the Log2, Log10, Ln functions to datafusion-functions crate URL: https://github.com/apache/arrow-datafusion/issues/9868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add CI compile checks for feature flags in datafusion-functions [arrow-datafusion]

2024-03-30 Thread via GitHub
alamb commented on PR #9772: URL: https://github.com/apache/arrow-datafusion/pull/9772#issuecomment-2028044687 > > Looks like we have to maintain the per-feature test in CI. I may look into this later to see if there is a better automatic way to do that. I agree -- the number of feat

  1   2   >