Re: [I] Implement a way to preserve partitioning through `UnionExec` without losing ordering [datafusion]

2024-05-21 Thread via GitHub
xinlifoobar commented on issue #10314: URL: https://github.com/apache/datafusion/issues/10314#issuecomment-2122201906 Hi @alamb, found another interesting case while testing, do you think this could apply `InterleaveExec` with same order by sets? ``` explain select count(*) from (

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on PR #10531: URL: https://github.com/apache/datafusion/pull/10531#issuecomment-2122279768 @jonahgao @alamb I think this is ready by me :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608039288 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [PR] Minor: Fix `ArrayFunctionRewriter` name reporting [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10581: URL: https://github.com/apache/datafusion/pull/10581#issuecomment-2122285943 Thanks for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Minor: Fix `ArrayFunctionRewriter` name reporting [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10581: URL: https://github.com/apache/datafusion/pull/10581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10575: URL: https://github.com/apache/datafusion/pull/10575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #10571: UserDefindedLogicalNode::from_template does not return a Result<...>. URL: https://github.com/apache/datafusion/issues/10571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1608050237 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,31 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK: k=

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10576: URL: https://github.com/apache/datafusion/pull/10576#issuecomment-2122297757 > The `CommonSubexprEliminate` rule has not been migrated yet. That is a good point -- though we could do something like `#[allow(deprecated)]` until it is. I hope to work on it

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10576: URL: https://github.com/apache/datafusion/pull/10576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10576: URL: https://github.com/apache/datafusion/pull/10576#issuecomment-2122298090 Thanks again @lewiszlw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608069310 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output interval f

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608079456 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
alamb closed pull request #10117: improve monotonicity api URL: https://github.com/apache/datafusion/pull/10117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122333403 I believe this PR has been superceded by https://github.com/apache/datafusion/pull/10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing thi

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #9879: Request: Improve Monotoniciy API URL: https://github.com/apache/datafusion/issues/9879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #9879: URL: https://github.com/apache/datafusion/issues/9879#issuecomment-2122335700 https://github.com/apache/datafusion/pull/10504 introduces a new API for boundary propagation, so the API challenge described in this ticket is no longer relevant. Thus closing --

[I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #9879: URL: https://github.com/apache/datafusion/issues/9879 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/arrow-datafusion/pull/9869 from @tinfoil-knight I was confused about the [`ScalarUDFImpl::mo

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #9879: Request: Improve Monotoniciy API URL: https://github.com/apache/datafusion/issues/9879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608102197 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output in

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608112791 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608116040 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output interval f

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122357671 > I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR > > Thanks for the

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
tinfoil-knight commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122366778 No worries @berkaysynnada and @alamb. It's difficult to keep track of everyone's work in such a large project. It was fun to work on this and I learnt a few things from

Re: [I] Cast String to Date ANSI Mode - Spark 3.2 - Mismatch between Spark and Comet Errors [datafusion-comet]

2024-05-21 Thread via GitHub
vidyasankarv commented on issue #440: URL: https://github.com/apache/datafusion-comet/issues/440#issuecomment-2122378853 > Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2? Is an issue with mismatch between e

[PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer opened a new pull request, #711: URL: https://github.com/apache/datafusion-python/pull/711 # Which issue does this PR close? Closes #696 # Rationale for this change This PR sets up a work flow to generate TPH-C 1Gb data set in CI, runs the 22 examples, and com

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer closed pull request #710: Tsaucer/prepare tpch examples for ci URL: https://github.com/apache/datafusion-python/pull/710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer commented on PR #710: URL: https://github.com/apache/datafusion-python/pull/710#issuecomment-2122385949 Closing in favor of https://github.com/apache/datafusion-python/pull/711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer commented on PR #711: URL: https://github.com/apache/datafusion-python/pull/711#issuecomment-2122437247 @Michael-J-Ward It looks like we have a _potential_ regression between 37.1.0 and 38.0.0. Namely `substr` on 37.1.0 would accept a start and length, the parameters that should

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1608186038 ## datafusion/sql/Cargo.toml: ## @@ -47,6 +47,7 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true, default-features = true } datafu

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1608186725 ## datafusion/sql/src/unparser/expr.rs: ## @@ -504,6 +508,14 @@ impl Unparser<'_> { .collect::>>() } +pub(super) fn new_ident_quoted_if_n

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2122464482 > Thanks @goldmedal I'm thinking how this will work with whitespaces columns like > > ``` > select 1 as "a a"; > ``` Thanks @comphead :) I'm not sure what you

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2122480005 I'm not sure but I think we can merge #10573 first because it also fix many unpasring tests. Then, I'll create PR for sqlparser to add the check rule in dialect. -- This is

[I] Expand Test Coverage for ScalarUDF's [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada opened a new issue, #10595: URL: https://github.com/apache/datafusion/issues/10595 ### Is your feature request related to a problem or challenge? After merging PR #10504, a new file [monotonicity.rs](https://github.com/apache/datafusion/blob/main/datafusion/functions/src

[PR] Rename monotonicity as output_ordering in ScalarUDF's [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada opened a new pull request, #10596: URL: https://github.com/apache/datafusion/pull/10596 ## Which issue does this PR close? Closes #. ## Rationale for this change The signature and usage of the monotonicity API have significantly changed. The

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-21 Thread via GitHub
aditanase commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2122637271 @alamb thanks for the very quick reply! Just tested with `datafusion-cli`, you're right that it's working. I was trying from a test deployment of Ballista. Will add the object

[PR] Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [datafusion]

2024-05-21 Thread via GitHub
lewiszlw opened a new pull request, #10597: URL: https://github.com/apache/datafusion/pull/10597 ## Which issue does this PR close? follow up of https://github.com/apache/datafusion/pull/10575. ## Rationale for this change ## What changes are included in t

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
crepererum commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1608400609 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -605,9 +611,6 @@ async fn test_dates_32_diff_rg_sizes() { .run("date32"); } -// BUG: same as

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608408244 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608413505 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,6 +1407,56 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608431571 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -505,7 +507,35 @@ pub async fn from_substrait_rel( _ => Ok(t), }

[I] Make the configuration for `StreamTable` more generic to support more stream sources [datafusion]

2024-05-21 Thread via GitHub
matthewmturner opened a new issue, #10599: URL: https://github.com/apache/datafusion/issues/10599 ### Is your feature request related to a problem or challenge? I am working on a websocket `TableProvider` and initially I went about creating my own `TableProvider` but then after review

[PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner opened a new pull request, #10600: URL: https://github.com/apache/datafusion/pull/10600 ## Which issue does this PR close? Closes #10599 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2122787433 @metesynnada @mustafasrepo i believe you were both involved in the `StreamTable` implementation so im interested in getting your views if this is going in the right direction

[PR] Add to_date function to scalar functions doc [datafusion]

2024-05-21 Thread via GitHub
Omega359 opened a new pull request, #10601: URL: https://github.com/apache/datafusion/pull/10601 ## Which issue does this PR close? Closes #10461 ## Rationale for this change Adding missing documentation ## What changes are included in this PR? doc

[I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #10602: URL: https://github.com/apache/datafusion/issues/10602 ### Is your feature request related to a problem or challenge? Broken out from @Abdullahsab3's great ticket https://github.com/apache/datafusion/issues/10368 We would like to apply date

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122811703 The way you can perform this binning in postgres is somewhat paradoxically to convert a timestamp with a timezone back to a timestamp without timezone and then apply `date_bin`.

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
edmondop commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1608470231 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122823905 If we cast using arrow_cast back to `Timestamp(Nanosecond, None)` the binning does appear to work correctly ```sql > create or replace view t_roundtrip as select arrow_ca

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-21 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2122829132 > What do we think about merging this PR and filing a follow on ticket to unify the APIs? I'm ok with merging the current state of the PR. But I was also thinking about how

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608493460 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608498362 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2122853631 This is very cool @comphead but it looks like it is not detecting any of the aggregate functions that we support? -- This is an automated message from the Apache Git Service. T

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608510461 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-21 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2122870652 @timsaucer I have fixed the calls of `expr::WindowFunction` to meet the changes and add tests for those window functions in `dataframe_functions.rs`. Let me know if I missed a

[PR] Implement Unparser for `UNION ALL` [datafusion]

2024-05-21 Thread via GitHub
phillipleblanc opened a new pull request, #10603: URL: https://github.com/apache/datafusion/pull/10603 ## Which issue does this PR close? It doesn't close this issue, but is part of the work for #9494 ## Rationale for this change Adds support for turning LogicalPlans that

Re: [PR] build: bump spark version to 3.4.3 [datafusion-comet]

2024-05-21 Thread via GitHub
codecov-commenter commented on PR #292: URL: https://github.com/apache/datafusion-comet/pull/292#issuecomment-2122931674 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/292?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[I] Incorrect statistics read for unsigned integer columns in parquet [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new issue, #10604: URL: https://github.com/apache/datafusion/issues/10604 ### Describe the bug I found this bug while adding tests for reading parquet statistics https://github.com/apache/datafusion/pull/10592/. Instead of getting corresponding UInt8Array, UInt16Arr

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122959190 Given the statement in the description, here is the best I can come up with using `arrow_cast` ```sql -- Times in brussels WITH t_brussels AS ( SELECT c

[PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao opened a new pull request, #456: URL: https://github.com/apache/datafusion-comet/pull/456 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608597228 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

[I] Incorrect statistics read for binary columns in parquet [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new issue, #10605: URL: https://github.com/apache/datafusion/issues/10605 ### Describe the bug I found this while adding tests for reading parquet statistics https://github.com/apache/datafusion/pull/10592. Instead of getting back `BinaryArray`, we get `StringArray`

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608598651 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
viirya merged PR #395: URL: https://github.com/apache/datafusion-comet/pull/395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122975595 @mhilton and I agree that if we had the functionality suggested by @Abdullahsab3's on https://github.com/apache/datafusion/issues/10368#issue-2277903243 > given a UTC ti

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#issuecomment-2122976226 Merged. Thanks @huaxingao @advancedxy @comphead @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao commented on PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#issuecomment-2122977103 Thanks, everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122979923 My suggested next steps for this ticket: 1. Someone prototype the "strip_timezone" function as a ScalarUDF and verify that we can in fact we can achieve the expected result from

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_count_

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_count_

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2122986254 @alamb I have created 2 more bug tickets but I cannot edit the description to add them in the subtasks. Can you help with that? 1. https://github.com/apache/datafusion/i

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608610498 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,6 +1407,56 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2122987903 @alamb : This PR is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Incorrect statistics read for unsigned integer columns in parquet [datafusion]

2024-05-21 Thread via GitHub
edmondop commented on issue #10604: URL: https://github.com/apache/datafusion/issues/10604#issuecomment-2122991483 This seems related to [this](https://github.com/apache/datafusion/pull/10593#discussion_r1608470231) comment -- This is an automated message from the Apache Git Service. To

Re: [PR] Improve ContextProvider [datafusion]

2024-05-21 Thread via GitHub
comphead merged PR #10577: URL: https://github.com/apache/datafusion/pull/10577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Better timezone functionalities [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10368: URL: https://github.com/apache/datafusion/issues/10368#issuecomment-2122994397 I have filed https://github.com/apache/datafusion/issues/10602 with a summary of how I understand the usecase of "how do we bin timesamps in timezones with daylight savings time

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608617934 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608622303 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -505,7 +507,35 @@ pub async fn from_substrait_rel( _ => Ok(t), }

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608623644 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

[I] Implement a benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #10606: URL: https://github.com/apache/datafusion/issues/10606 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10453 Part of https://github.com/apache/datafusion/issues/10453 is to "ef

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608634043 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
advancedxy commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608635628 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2123026061 @comphead > What I still cannot understand is this a regression test for the bug we missed earlier? I am working on new arrow statistics API https://github.com/apache

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-21 Thread via GitHub
timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2123031753 Oh, great. Have you been able to run the [example code above](https://github.com/apache/datafusion/issues/6747#issuecomment-2090260284) using the new easy interface? -- This

Re: [PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
andygrove merged PR #711: URL: https://github.com/apache/datafusion-python/pull/711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

2024-05-21 Thread via GitHub
andygrove closed issue #696: Ensure examples stay updated in CI. URL: https://github.com/apache/datafusion-python/issues/696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Regression in `substr` performance from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-21 Thread via GitHub
andygrove closed issue #712: Regression in `substr` performance from 37.1.0 to 38.0.0 URL: https://github.com/apache/datafusion-python/issues/712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608699758 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608714060 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao closed pull request #456: feat: correlation support URL: https://github.com/apache/datafusion-comet/pull/456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2123171863 > What I still cannot understand is this a regression test for the bug we missed earlier? My strong suspicion is that the bugs @NGA-TRAN is finding would manifest themselves as

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2123173647 > @alamb I have created 2 more bug tickets but I cannot edit the description to add them in the subtasks. Can you help with that? Done -- This is an automated message fro

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2123183351 > I'd be happy to contribute some docs / examples if you point me at something similar. Thanks @aditanase 🙏 I would recommend two things: # Suggestion 1: Change

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608748477 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[PR] Refactor parquet row group pruning into a struct [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new pull request, #10607: URL: https://github.com/apache/datafusion/pull/10607 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
codecov-commenter commented on PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#issuecomment-2123243529 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/456?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[PR] fix: Specify schema when converting TPC-H csv to parquet [datafusion-benchmarks]

2024-05-21 Thread via GitHub
andygrove opened a new pull request, #3: URL: https://github.com/apache/datafusion-benchmarks/pull/3 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

  1   2   >