[PR] Remove `AggregateFunctionDefinition::Name` [datafusion]

2024-05-09 Thread via GitHub
lewiszlw opened a new pull request, #10441: URL: https://github.com/apache/datafusion/pull/10441 ## Which issue does this PR close? Remove `AggregateFunctionDefinition::Name` as it's useless. ## Rationale for this change ## What changes are included in thi

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2103912260 > Is it possible to make the `array_slice` function accept a vector as its parameter? > > ```rust > fn array_slice(args: Vec) { > } > ``` > > When makin

Re: [I] Type coercion when creating table [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on issue #10440: URL: https://github.com/apache/datafusion/issues/10440#issuecomment-2103891076 I also notice that but did not dig into it. We might need type coercion in `LogicalPlan::Values`. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [datafusion]

2024-05-09 Thread via GitHub
yyy1000 commented on code in PR #10418: URL: https://github.com/apache/datafusion/pull/10418#discussion_r1596271053 ## datafusion/physical-expr/src/aggregate/covariance.rs: ## @@ -319,281 +225,3 @@ impl Accumulator for CovarianceAccumulator { std::mem::size_of_val(self)

Re: [PR] Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on code in PR #10418: URL: https://github.com/apache/datafusion/pull/10418#discussion_r1596262433 ## datafusion/physical-expr/src/aggregate/covariance.rs: ## @@ -319,281 +225,3 @@ impl Accumulator for CovarianceAccumulator { std::mem::size_of_val(se

Re: [PR] Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on code in PR #10418: URL: https://github.com/apache/datafusion/pull/10418#discussion_r1596262433 ## datafusion/physical-expr/src/aggregate/covariance.rs: ## @@ -319,281 +225,3 @@ impl Accumulator for CovarianceAccumulator { std::mem::size_of_val(se

[I] Type coercion when creating table [datafusion]

2024-05-09 Thread via GitHub
yyy1000 opened a new issue, #10440: URL: https://github.com/apache/datafusion/issues/10440 ### Is your feature request related to a problem or challenge? When creating a table specifying the double as column datatype and giving some initial values without explicit double type, for exa

Re: [PR] Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [datafusion]

2024-05-09 Thread via GitHub
yyy1000 commented on code in PR #10418: URL: https://github.com/apache/datafusion/pull/10418#discussion_r1596242627 ## datafusion/physical-expr/src/aggregate/covariance.rs: ## @@ -319,281 +225,3 @@ impl Accumulator for CovarianceAccumulator { std::mem::size_of_val(self)

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1596183866 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -833,12 +865,8 @@ mod test { signature: Signature::uniform(1, vec![DataType::Float32],

Re: [PR] Improve flight sql examples [datafusion]

2024-05-09 Thread via GitHub
lewiszlw commented on code in PR #10432: URL: https://github.com/apache/datafusion/pull/10432#discussion_r1596181005 ## datafusion-examples/examples/flight/flight_sql_server.rs: ## @@ -337,234 +346,52 @@ impl FlightSqlService for FlightSqlServiceImpl { Ok(resp) }

Re: [PR] Support nulls and empty for array functions [datafusion]

2024-05-09 Thread via GitHub
github-actions[bot] closed pull request #7338: Support nulls and empty for array functions URL: https://github.com/apache/datafusion/pull/7338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Transform with payload [datafusion]

2024-05-09 Thread via GitHub
github-actions[bot] commented on PR #8664: URL: https://github.com/apache/datafusion/pull/8664#issuecomment-2103703306 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] rewrite nvl function [datafusion]

2024-05-09 Thread via GitHub
github-actions[bot] commented on PR #9413: URL: https://github.com/apache/datafusion/pull/9413#issuecomment-2103703270 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1596155186 ## datafusion/expr/src/expr_schema.rs: ## @@ -139,9 +139,10 @@ impl ExprSchemable for Expr { .map(|e| e.get_type(schema))

[PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 opened a new pull request, #10439: URL: https://github.com/apache/datafusion/pull/10439 ## Which issue does this PR close? Closes #10423 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-09 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117835 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-09 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117835 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-09 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596118952 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2103614074 You can define closure in `datafusion/expr/src/function.rs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-09 Thread via GitHub
viirya commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2103479317 I remember only the first-time contributors need approval to trigger CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] build: Switch back to released version of DataFusion and arrow-rs after Arrow Java 16 is released [datafusion-comet]

2024-05-09 Thread via GitHub
viirya commented on PR #403: URL: https://github.com/apache/datafusion-comet/pull/403#issuecomment-2103477288 Ah, I found that I made a mistake in the Java Arrow PR that it doesn't initiate the offset buffer well. Proposed another issue at Java Arrow https://github.com/apache/arrow/issues/4

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-09 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2103476973 @viirya @andygrove Is there a way to start CI without bothering you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add franz feature [datafusion]

2024-05-09 Thread via GitHub
emgeee closed pull request #10438: Add franz feature URL: https://github.com/apache/datafusion/pull/10438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

[PR] Add franz feature [datafusion]

2024-05-09 Thread via GitHub
emgeee opened a new pull request, #10438: URL: https://github.com/apache/datafusion/pull/10438 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] improve monotonicity api [datafusion]

2024-05-09 Thread via GitHub
tinfoil-knight commented on code in PR #10117: URL: https://github.com/apache/datafusion/pull/10117#discussion_r1595938806 ## datafusion/expr/src/signature.rs: ## @@ -346,13 +346,60 @@ impl Signature { } } -/// Monotonicity of the `ScalarFunctionExpr` with respect to its

Re: [I] Support sort pushdown [datafusion]

2024-05-09 Thread via GitHub
backkem commented on issue #7871: URL: https://github.com/apache/datafusion/issues/7871#issuecomment-2103372651 The federation repo turns (part of) the query plan back into SQL. In the simple case, the query only uses table providers of one remote DBMS. In that case the entire query will be

Re: [PR] improve monotonicity api [datafusion]

2024-05-09 Thread via GitHub
tinfoil-knight commented on code in PR #10117: URL: https://github.com/apache/datafusion/pull/10117#discussion_r1595579610 ## datafusion/expr/src/signature.rs: ## @@ -346,13 +346,81 @@ impl Signature { } } -/// Monotonicity of the `ScalarFunctionExpr` with respect to its

Re: [I] Support sort pushdown [datafusion]

2024-05-09 Thread via GitHub
karlovnv commented on issue #7871: URL: https://github.com/apache/datafusion/issues/7871#issuecomment-2103286570 > For now I'm experiencing with another approach in [datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation). @backkem Could you please provide mor

Re: [PR] build: Switch back to released version of DataFusion and arrow-rs after Arrow Java 16 is released [datafusion-comet]

2024-05-09 Thread via GitHub
viirya commented on PR #403: URL: https://github.com/apache/datafusion-comet/pull/403#issuecomment-2103282710 Hmm, the error is actually different: ``` Cause: org.apache.comet.CometNativeException: Fail to process Arrow array with reason C Data interface error: The external buffe

[I] doc builds are broken [datafusion-python]

2024-05-09 Thread via GitHub
Michael-J-Ward opened a new issue, #675: URL: https://github.com/apache/datafusion-python/issues/675 @andygrove via Discord: Documentation publishing to the site is broken, likely because this does not get tested on PR builds ``` >>>--

Re: [I] Support sort pushdown [datafusion]

2024-05-09 Thread via GitHub
backkem commented on issue #7871: URL: https://github.com/apache/datafusion/issues/7871#issuecomment-210356 Indeed, my use-case is querying across remote DBMSs. For now I'm experiencing with another approach in [datafusion-federation](https://github.com/datafusion-contrib/datafusion-fed

Re: [I] Implement Spark-compatible CAST from String to Floating Point [datafusion-comet]

2024-05-09 Thread via GitHub
mattharder91 commented on issue #326: URL: https://github.com/apache/datafusion-comet/issues/326#issuecomment-2103196311 Will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] "Unknown frame descriptor" for ZSTD data. [datafusion]

2024-05-09 Thread via GitHub
Smotrov commented on issue #10435: URL: https://github.com/apache/datafusion/issues/10435#issuecomment-2103194344 Here is an example file [data.zst.json](https://github.com/apache/datafusion/files/15266143/data.zst.json) And the code, which shows that the file could be perfectly decod

Re: [PR] chore: Update Python release process now that DataFusion is TLP [datafusion-python]

2024-05-09 Thread via GitHub
andygrove merged PR #674: URL: https://github.com/apache/datafusion-python/pull/674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] chore: Add criterion benchmarks for casting between integer types [datafusion-comet]

2024-05-09 Thread via GitHub
andygrove commented on code in PR #401: URL: https://github.com/apache/datafusion-comet/pull/401#discussion_r1595801156 ## core/benches/cast_numeric.rs: ## @@ -0,0 +1,79 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] chore: Add criterion benchmarks for casting between integer types [datafusion-comet]

2024-05-09 Thread via GitHub
andygrove merged PR #401: URL: https://github.com/apache/datafusion-comet/pull/401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Add push down sort to the source (table provider) [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10433: URL: https://github.com/apache/datafusion/issues/10433#issuecomment-2103142141 (BTW @NGA-TRAN and I worked on a very similar feature in InfluxDB IOx -- and we implemented a special operator that knows how to do this "read only the most recent file" for quer

Re: [PR] chore: Update Python release process now that DataFusion is TLP [datafusion-python]

2024-05-09 Thread via GitHub
andygrove commented on code in PR #674: URL: https://github.com/apache/datafusion-python/pull/674#discussion_r1595785654 ## dev/release/README.md: ## @@ -103,42 +103,7 @@ git push apache 0.8.0-rc1 ./dev/release/create-tarball.sh 0.8.0 1 ``` -This will also create the email t

[PR] chore: Update Python release process now that DataFusion is TLP [datafusion-python]

2024-05-09 Thread via GitHub
andygrove opened a new pull request, #674: URL: https://github.com/apache/datafusion-python/pull/674 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

Re: [I] Add push down sort to the source (table provider) [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10433: URL: https://github.com/apache/datafusion/issues/10433#issuecomment-2103138751 > It the example TableProvider may know that it needed to provide only last record batch (or the latest parquet file from folder). The provider can tell DataFusion it produc

Re: [I] Support sort pushdown [datafusion]

2024-05-09 Thread via GitHub
karlovnv commented on issue #7871: URL: https://github.com/apache/datafusion/issues/7871#issuecomment-2103134768 > Is the idea that the table providers have some faster way to sort than what is built into DataFusion? As I understand @backkem wanted to load data from an external dataso

[PR] Minor: format comments in filter pushdown rule [datafusion]

2024-05-09 Thread via GitHub
alamb opened a new pull request, #10437: URL: https://github.com/apache/datafusion/pull/10437 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/10291 ## Rationale for this change While reviewing this code for https://github.com/apache/

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-09 Thread via GitHub
viirya commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2103124845 Triggered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add push down sort to the source (table provider) [datafusion]

2024-05-09 Thread via GitHub
karlovnv commented on issue #10433: URL: https://github.com/apache/datafusion/issues/10433#issuecomment-2103119446 > Possibly related: #7871 @alamb Thank you for the reply! I've read discussion in #7871 and think that this case is different. I don't want to say that MySou

Re: [PR] make common expression alias human-readable [datafusion]

2024-05-09 Thread via GitHub
alamb commented on PR #10333: URL: https://github.com/apache/datafusion/pull/10333#issuecomment-2103085218 Revert PR: https://github.com/apache/datafusion/pull/10436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Make `alias_symbol` more human-readable [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10280: URL: https://github.com/apache/datafusion/issues/10280#issuecomment-2103084827 We unfortunately found issues in https://github.com/apache/datafusion/pull/10333 so we are going to revert it in https://github.com/apache/datafusion/pull/10436 See discuss

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-09 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2103082792 @viirya @andygrove Please approve to start CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Add push down sort to the source (table provider) [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10433: URL: https://github.com/apache/datafusion/issues/10433#issuecomment-2103079866 Possibly related: https://github.com/apache/datafusion/issues/7871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-09 Thread via GitHub
kazuyukitanimura opened a new pull request, #407: URL: https://github.com/apache/datafusion-comet/pull/407 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/372 ## Rationale for this change To be ready for Spark 4

Re: [I] "Unknown frame descriptor" for ZSTD data. [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10435: URL: https://github.com/apache/datafusion/issues/10435#issuecomment-2103074877 Thanks for the report -- can you possiblly share an example of such a file (or instructions for how to create one)? -- This is an automated message from the Apache Git Service.

[PR] Revert 10333 [datafusion]

2024-05-09 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #10436: URL: https://github.com/apache/datafusion/pull/10436 ## Which issue does this PR close? Revert #10333 ## Rationale for this change This issue is to be revisited after #10413. ## What changes are in

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-09 Thread via GitHub
alamb commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1594710170 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -250,90 +265,67 @@ fn find_inner_join( })) } -fn intersect( -accum: &mut Vec<(Expr, Expr)>, -

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-09 Thread via GitHub
alamb commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1594710628 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Minor: Simplify + document `EliminateCrossJoin` better [datafusion]

2024-05-09 Thread via GitHub
alamb merged PR #10427: URL: https://github.com/apache/datafusion/pull/10427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Simplify + document `EliminateCrossJoin` better [datafusion]

2024-05-09 Thread via GitHub
alamb commented on PR #10427: URL: https://github.com/apache/datafusion/pull/10427#issuecomment-2103048124 Thanks for the reviews @jackwener and @comphead ❤️ -- this now opens up a few more PRs I have queued up to make this code faster (by reducing cloning) https://github.com/apache/datafu

Re: [PR] Minor: Simplify + document `EliminateCrossJoin` better [datafusion]

2024-05-09 Thread via GitHub
alamb commented on code in PR #10427: URL: https://github.com/apache/datafusion/pull/10427#discussion_r1595719478 ## datafusion/expr/src/utils.rs: ## @@ -909,8 +909,8 @@ pub fn check_all_columns_from_schema( pub fn find_valid_equijoin_key_pair( left_key: &Expr, right_

Re: [PR] Add more sqllogictests for `parquet_sorted_statistics` [datafusion]

2024-05-09 Thread via GitHub
yyy1000 commented on code in PR #10381: URL: https://github.com/apache/datafusion/pull/10381#discussion_r1595718706 ## datafusion/sqllogictest/test_files/parquet_sorted_statistics.slt: ## @@ -260,3 +260,77 @@ physical_plan 01)SortPreservingMergeExec: [constant_col@0 ASC NULLS L

Re: [PR] Add more sqllogictests for `parquet_sorted_statistics` [datafusion]

2024-05-09 Thread via GitHub
yyy1000 commented on code in PR #10381: URL: https://github.com/apache/datafusion/pull/10381#discussion_r1595718706 ## datafusion/sqllogictest/test_files/parquet_sorted_statistics.slt: ## @@ -260,3 +260,77 @@ physical_plan 01)SortPreservingMergeExec: [constant_col@0 ASC NULLS L

Re: [I] Enable GitHub discussions [datafusion-comet]

2024-05-09 Thread via GitHub
andygrove commented on issue #368: URL: https://github.com/apache/datafusion-comet/issues/368#issuecomment-2103044593 In order to enable the discussions feature, I will need to file an issue with ASF INFRA and will need to provide a link to a "consensus discussion thread". I will start a t

Re: [I] Boolean operators in expressions are ignored [datafusion-python]

2024-05-09 Thread via GitHub
andygrove closed issue #667: Boolean operators in expressions are ignored URL: https://github.com/apache/datafusion-python/issues/667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add document about basics of working with expressions [datafusion-python]

2024-05-09 Thread via GitHub
andygrove merged PR #668: URL: https://github.com/apache/datafusion-python/pull/668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] chore: Add criterion benchmarks for casting between integer types [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on code in PR #401: URL: https://github.com/apache/datafusion-comet/pull/401#discussion_r1595622813 ## core/benches/cast_numeric.rs: ## @@ -0,0 +1,79 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] chore: Add criterion benchmarks for casting between integer types [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on code in PR #401: URL: https://github.com/apache/datafusion-comet/pull/401#discussion_r1595619638 ## core/benches/cast_from_string.rs: ## @@ -73,6 +61,23 @@ fn criterion_benchmark(c: &mut Criterion) { }); } +fn create_utf8_batch() -> RecordBatch { R

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on code in PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#discussion_r1595595448 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -135,6 +146,97 @@ class CometExpressionCoverageSuite extends CometTestBase w

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on code in PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#discussion_r1595594959 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -135,6 +146,97 @@ class CometExpressionCoverageSuite extends CometTestBase w

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on code in PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#discussion_r1595593862 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -123,7 +134,7 @@ class CometExpressionCoverageSuite extends CometTestBase wi

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-09 Thread via GitHub
comphead commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2102867711 > Just to understand, backed by datafusion does not automatically mean that has Spark compatibility? I hope in most cases yes as we have a generic wrapper, this file is more

Re: [PR] Add more sqllogictests for `parquet_sorted_statistics` [datafusion]

2024-05-09 Thread via GitHub
comphead commented on code in PR #10381: URL: https://github.com/apache/datafusion/pull/10381#discussion_r1595586048 ## datafusion/sqllogictest/test_files/parquet_sorted_statistics.slt: ## @@ -260,3 +260,77 @@ physical_plan 01)SortPreservingMergeExec: [constant_col@0 ASC NULLS

Re: [PR] Simplify Format Options [datafusion]

2024-05-09 Thread via GitHub
berkaysynnada commented on code in PR #10404: URL: https://github.com/apache/datafusion/pull/10404#discussion_r1595583918 ## datafusion/sql/src/parser.rs: ## @@ -1048,66 +959,41 @@ mod tests { name: "t".into(), columns: vec![make_column_def("c1", DataTy

Re: [PR] Simplify Format Options [datafusion]

2024-05-09 Thread via GitHub
berkaysynnada commented on code in PR #10404: URL: https://github.com/apache/datafusion/pull/10404#discussion_r1595583182 ## datafusion/sqllogictest/test_files/create_external_table.slt: ## @@ -68,18 +60,6 @@ CREATE EXTERNAL TABLE t STORED AS CSV STORED AS PARQUET LOCATION 'foo

Re: [PR] improve monotonicity api [datafusion]

2024-05-09 Thread via GitHub
tinfoil-knight commented on code in PR #10117: URL: https://github.com/apache/datafusion/pull/10117#discussion_r1595579610 ## datafusion/expr/src/signature.rs: ## @@ -346,13 +346,81 @@ impl Signature { } } -/// Monotonicity of the `ScalarFunctionExpr` with respect to its

Re: [PR] Minor: Simplify + document `EliminateCrossJoin` better [datafusion]

2024-05-09 Thread via GitHub
comphead commented on code in PR #10427: URL: https://github.com/apache/datafusion/pull/10427#discussion_r1595579395 ## datafusion/expr/src/utils.rs: ## @@ -909,8 +909,8 @@ pub fn check_all_columns_from_schema( pub fn find_valid_equijoin_key_pair( left_key: &Expr, rig

Re: [PR] Minor: Simplify + document `EliminateCrossJoin` better [datafusion]

2024-05-09 Thread via GitHub
comphead commented on code in PR #10427: URL: https://github.com/apache/datafusion/pull/10427#discussion_r1595577902 ## datafusion/expr/src/utils.rs: ## @@ -885,7 +885,7 @@ pub fn can_hash(data_type: &DataType) -> bool { /// Check whether all columns are from the schema. pub f

Re: [PR] Simplify Format Options [datafusion]

2024-05-09 Thread via GitHub
berkaysynnada commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2102843110 This PR is now ready for review. By employing that [method](https://github.com/apache/datafusion/issues/10414#issuecomment-2099555237), users can choose to maintain the existin

Re: [PR] Simplify Format Options [datafusion]

2024-05-09 Thread via GitHub
berkaysynnada commented on code in PR #10404: URL: https://github.com/apache/datafusion/pull/10404#discussion_r1595568026 ## datafusion/sqllogictest/test_files/parquet.slt: ## @@ -66,7 +66,6 @@ CREATE EXTERNAL TABLE test_table ( date_col DATE ) STORED AS PARQUET -WITH HEADE

Re: [PR] Add document about basics of working with expressions [datafusion-python]

2024-05-09 Thread via GitHub
timsaucer commented on PR #668: URL: https://github.com/apache/datafusion-python/pull/668#issuecomment-2102810555 @andygrove Checks passed, should be ready to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-09 Thread via GitHub
viirya commented on code in PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#discussion_r1595546942 ## common/src/main/java/org/apache/comet/vector/CometPlainVector.java: ## @@ -111,7 +115,12 @@ public UTF8String getUTF8String(int rowId) { byte[] result =

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102766839 > > > We're on the same page @jayzhan211 > > > > > > If we don't need `Expr` for simplified UDAF, than we can have > > ```rust > > pub fn simplify( > > &

Re: [I] Stop copying LogicalPlan and Exprs in `PushDownFilter` [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10291: URL: https://github.com/apache/datafusion/issues/10291#issuecomment-2102764792 I am starting to unravel the remaining copies in PushDownFilter -- it is non trivial -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat: Implement Spark-compatible CAST from Numeric to Binary [datafusion-comet]

2024-05-09 Thread via GitHub
mattharder91 commented on PR #406: URL: https://github.com/apache/datafusion-comet/pull/406#issuecomment-2102747381 I am getting ``` Cannot resolve "TRY_CAST(a AS BINARY)" due to data type mismatch: cannot cast "BIGINT" to "BINARY" ``` In non ANSI Mode and respectfully for a

Re: [PR] feat: Implement Spark-compatible CAST from Numeric to Binary [datafusion-comet]

2024-05-09 Thread via GitHub
mattharder91 commented on code in PR #406: URL: https://github.com/apache/datafusion-comet/pull/406#discussion_r1595509847 ## spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: ## @@ -248,4 +250,12 @@ object CometCast { case _ => Unsupported } + privat

Re: [PR] feat: Implement Spark-compatible CAST from Numeric to Binary [datafusion-comet]

2024-05-09 Thread via GitHub
leoluan2009 commented on code in PR #406: URL: https://github.com/apache/datafusion-comet/pull/406#discussion_r1595505674 ## spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: ## @@ -248,4 +250,12 @@ object CometCast { case _ => Unsupported } + private

Re: [I] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` [datafusion]

2024-05-09 Thread via GitHub
alamb commented on issue #10287: URL: https://github.com/apache/datafusion/issues/10287#issuecomment-2102719342 It is done over a few PRs but I have this change now working and I think it is looking quite good: https://github.com/apache/datafusion/pull/10431 -- This is an automated messag

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` [datafusion]

2024-05-09 Thread via GitHub
alamb commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1595486547 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -237,7 +324,7 @@ fn find_inner_join( )?); return Ok(LogicalPlan::Join(Join { -

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102706142 > > We're on the same page @jayzhan211 > > If we don't need `Expr` for simplified UDAF, than we can have > > ```rust > pub fn simplify( > &self, >

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102701229 > We're on the same page @jayzhan211 If we don't need `Expr` for simplified UDAF, than we can have ```rust pub fn simplify( &self, args: Aggregate

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102693750 We're on the same page @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102684929 It should be this. ```rust // UDF fn simplify( &self, ) -> Option Result>> { ``` The reason for optional closure is that I assume we need to re

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102672197 > > My final answer for today is > > ```rust > fn simplify( > &self, > args: AggregateArgs, > _info: &dyn SimplifyInfo, > ) ->

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102665705 > > After playing around, I feel like maybe optional closure is our answer 🤔 > > Damn! I should have created a branch with that code 😀, if we have consensus on that directio

Re: [PR] During expression equality, check for new ordering information [datafusion]

2024-05-09 Thread via GitHub
ozankabak commented on code in PR #10434: URL: https://github.com/apache/datafusion/pull/10434#discussion_r1595444209 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -198,6 +198,61 @@ impl EquivalenceProperties { left: &Arc, right: &Arc, )

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102642821 > > After playing around, I feel like maybe optional closure is our answer 🤔 > > Damn! I should have created a branch with that code 😀, if we have consensus on that directio

[I] "Unknown frame descriptor" for ZSTD data. [datafusion]

2024-05-09 Thread via GitHub
Smotrov opened a new issue, #10435: URL: https://github.com/apache/datafusion/issues/10435 ### Describe the bug When reading a partition of big NDJSON files compressed with ZSTD there is an error appears. `Error: Custom { kind: Other, error: External(ArrowError(ExternalError(IoEr

[PR] feat: Implement Spark-compatible CAST from Numeric to Binary [datafusion-comet]

2024-05-09 Thread via GitHub
mattharder91 opened a new pull request, #406: URL: https://github.com/apache/datafusion-comet/pull/406 ## Which issue does this PR close? https://github.com/apache/datafusion-comet/issues/405, https://github.com/apache/datafusion-comet/issues/377 Closes #. ## Rationale f

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102610136 > After playing around, I feel like maybe optional closure is our answer 🤔 Damn! I should have created a branch with that code 😀, if we have consensus on that direction I'

Re: [I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-05-09 Thread via GitHub
fabianmurariu commented on issue #10421: URL: https://github.com/apache/datafusion/issues/10421#issuecomment-2102606499 Strange, I'm encountering this with custom TableProviders, I'll be able to share more next week tho -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-09 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2102602155 > @jayzhan211 and @alamb whenever you get chance please have a look, IMHO I find proposal with closure to tick all the boxes, but this one is definitely simpler. After playing

Re: [PR] make common expression alias human-readable [datafusion]

2024-05-09 Thread via GitHub
alamb commented on PR #10333: URL: https://github.com/apache/datafusion/pull/10333#issuecomment-2102581042 Sounds like the consensus is to revert this PR -- could you possible make a revert PR @MohamedAbdeen21 ? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Add `LogicalPlan::recompute_schema` for handling rewrite passes [datafusion]

2024-05-09 Thread via GitHub
alamb commented on code in PR #10410: URL: https://github.com/apache/datafusion/pull/10410#discussion_r1595387773 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -467,6 +468,200 @@ impl LogicalPlan { self.with_new_exprs(self.expressions(), inputs.to_vec()) } +

Re: [I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-05-09 Thread via GitHub
mustafasrepo commented on issue #10421: URL: https://github.com/apache/datafusion/issues/10421#issuecomment-2102576381 > Thanks @fabianmurariu > > cc @mustafasrepo in case you have any thoughts I have tried to reproduce problem by defining absolutely necessary fields in the que

  1   2   >