Re: [PR] chore: Fix arrow deprecation [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 closed pull request #14603: chore: Fix arrow deprecation URL: https://github.com/apache/datafusion/pull/14603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] [EPIC] A(nother) list of performance improvement tickets [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14482: URL: https://github.com/apache/datafusion/issues/14482#issuecomment-2650473653 > I know the dataframe api isn't used by many but it needs some love too: [#14563](https://github.com/apache/datafusion/issues/14563) Added! -- This is an automated messa

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2650705661 - I filed https://github.com/apache/datafusion/issues/14608 to track the idea of making it easier to run tpch in datafusion-cli -- This is an automated message from the Apache G

[I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-02-11 Thread via GitHub
alamb opened a new issue, #14608: URL: https://github.com/apache/datafusion/issues/14608 ### Is your feature request related to a problem or challenge? [TPC-H](https://www.tpc.org/tpch/) is an important and well studied benchmark. It is used for testing many database optimizations and

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1950779945 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_typ

Re: [PR] Fix typo [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14605: URL: https://github.com/apache/datafusion/pull/14605#issuecomment-2650715000 Thanks @byte-sourcerer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] Draft: LogicalScalar [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 opened a new pull request, #14609: URL: https://github.com/apache/datafusion/pull/14609 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

[I] Optimize `repeat` function [datafusion]

2025-02-11 Thread via GitHub
alamb opened a new issue, #14610: URL: https://github.com/apache/datafusion/issues/14610 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/datafusion/pull/14575 from @wForget I noticed that the implementation of `repeat` could be

Re: [I] Optimize `repeat` function [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14610: URL: https://github.com/apache/datafusion/issues/14610#issuecomment-2650888782 I think this is a good first issue as the code is self contained, the issue explained well and there are benchmarks You can run ```shell cargo bench --bench repeat

Re: [PR] Make it easier to create a ScalarValure representing typed null (#14548) [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 merged PR #14558: URL: https://github.com/apache/datafusion/pull/14558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Make it easier to create a ScalarValure representing typed null (#14548) [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on PR #14558: URL: https://github.com/apache/datafusion/pull/14558#issuecomment-2650483941 Thanks @cj-zhukov and the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Revert modification of build dependency [datafusion]

2025-02-11 Thread via GitHub
ugoa opened a new pull request, #14606: URL: https://github.com/apache/datafusion/pull/14606 ## Which issue does this PR close? - Closes [#14435](https://github.com/apache/datafusion/issues/14435) ## Rationale for this change Remove unused build dependency `cmake`

Re: [PR] refactor: Move FileSinkConfig out of Core [datafusion]

2025-02-11 Thread via GitHub
alamb merged PR #14585: URL: https://github.com/apache/datafusion/pull/14585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Logically repartition files by row splits [datafusion]

2025-02-11 Thread via GitHub
AdamGS opened a new issue, #14607: URL: https://github.com/apache/datafusion/issues/14607 ### Is your feature request related to a problem or challenge? We’re implementing a file format [Vortex](https://github.com/spiraldb/vortex), which has no “row groups” or similar concept, meanin

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-11 Thread via GitHub
ugoa commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2650677977 @alamb Sure, here it is https://github.com/apache/datafusion/pull/14606, please help review. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-11 Thread via GitHub
iffyio commented on code in PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#discussion_r1950766697 ## src/parser/mod.rs: ## @@ -13337,6 +13376,42 @@ impl<'a> Parser<'a> { }) } +/// Parse an expression, optionally followed by ASC or

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-11 Thread via GitHub
findepi commented on PR #14470: URL: https://github.com/apache/datafusion/pull/14470#issuecomment-2650811918 Per project guidelines proposal https://github.com/apache/datafusion/pull/13706 it feels to me as belong to core. It would be great to either finalize those guidelines or close th

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-11 Thread via GitHub
findepi commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950837919 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1717,56 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`, w

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-11 Thread via GitHub
findepi commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2650821795 > but some users do not want the casts introduced by `TypeCoercion` This is very broad statement. I don't like the DataFusion coercion logic per se because it has bugs (

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951127062 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -81,11 +77,9 @@ STORED AS arrow LOCATION 'test_files/scratch/insert_to_external/arrow_dict

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-11 Thread via GitHub
mbutrovich commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2651252516 utils.rs's `array_with_timezone` mentions it doesn't support converting to NTZ, but we were calling it. I am gonna try an assertion in there and see if that breaks anything.

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-11 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1951123998 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951127062 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -81,11 +77,9 @@ STORED AS arrow LOCATION 'test_files/scratch/insert_to_external/arrow_dict

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-11 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1951123998 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-11 Thread via GitHub
iffyio commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1951078414 ## src/ast/operator.rs: ## @@ -53,6 +53,16 @@ pub enum UnaryOperator { PGAbs, /// Unary logical not operator: e.g. `! false` (Hive-specific)

[I] Set default value of parse_float_as_decimal to true [datafusion]

2025-02-11 Thread via GitHub
andygrove opened a new issue, #14612: URL: https://github.com/apache/datafusion/issues/14612 ### Is your feature request related to a problem or challenge? According to the ANSI SQL specification, floating point numbers in SQL should be interpreted as decimals. We currently interpret

Re: [I] Set default value of parse_float_as_decimal to true [datafusion]

2025-02-11 Thread via GitHub
jatin510 commented on issue #14612: URL: https://github.com/apache/datafusion/issues/14612#issuecomment-2651278826 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951145929 ## datafusion/sqllogictest/test_files/aggregate_skip_partial.slt: ## @@ -228,7 +228,7 @@ CREATE TABLE aggregate_test_100_null ( c11 FLOAT ); -statement o

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951145929 ## datafusion/sqllogictest/test_files/aggregate_skip_partial.slt: ## @@ -228,7 +228,7 @@ CREATE TABLE aggregate_test_100_null ( c11 FLOAT ); -statement o

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1951150363 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_disco

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14576: URL: https://github.com/apache/datafusion/issues/14576#issuecomment-2650589502 Here is a PR to disable the test: - https://github.com/apache/datafusion/pull/14604 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] refactor: remove uses of arrow_schema and use reexport in arrow instead [datafusion]

2025-02-11 Thread via GitHub
alamb merged PR #14597: URL: https://github.com/apache/datafusion/pull/14597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Ad-hoc or scheduled mutation based testing [datafusion]

2025-02-11 Thread via GitHub
edmondop commented on issue #14589: URL: https://github.com/apache/datafusion/issues/14589#issuecomment-2650966080 @alamb they did, although they went out of space! If you click on the "this" hyperlink in the text of the issue, you get here https://github.com/edmondop/arrow-datafusion/actio

Re: [PR] Revert modification of build dependency [datafusion]

2025-02-11 Thread via GitHub
alamb merged PR #14606: URL: https://github.com/apache/datafusion/pull/14606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fallback to Utf8View for `Dict(_, Utf8View)` in `type_union_resolution_coercion` [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on PR #14602: URL: https://github.com/apache/datafusion/pull/14602#issuecomment-2650968135 It seems binary view is problematic, so I remove it for now. Add it if there is test covered -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
jonahgao commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1950956643 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_types

Re: [I] Simple Functions [datafusion]

2025-02-11 Thread via GitHub
Omega359 commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2650993466 I love the work that you've put into this! You use the regexp_like as an example of what could be handled by if you look at that function it actually doesn't operate on individ

Re: [PR] Disable extended tests (`extended_tests`) that are failing on runner [datafusion]

2025-02-11 Thread via GitHub
Omega359 commented on PR #14604: URL: https://github.com/apache/datafusion/pull/14604#issuecomment-2651000934 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
jonahgao commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1950969156 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_types

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951127062 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -81,11 +77,9 @@ STORED AS arrow LOCATION 'test_files/scratch/insert_to_external/arrow_dict

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
comphead commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1951169297 ## .github/actions/rust-test/action.yaml: ## @@ -29,7 +29,7 @@ runs: shell: bash run: | cd native -cargo clippy --color=never -

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1951175661 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -429,65 +429,131 @@ abstract class CometTestBase makeParquetFileAllTypes(path, d

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1951159290 ## native/hdfs/Cargo.toml: ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2651322116 In looking into this issue I have a question for the db experts that happen to be following this issue. The with_column code builds a `Vec` called fields in the dataframe

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2651378376 > > For now `libhdfs` on JVM provides more HDFS client configuration which critical on production sites comparing to https://github.com/Kimahriman/hdfs-native?tab=readme-ov-fil

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2651312412 > > From the org of [datafusion-contrib](https://github.com/datafusion-contrib?q=hdfs&type=all&language=&sort=), I see many hdfs crates, which one is best for comet? > >

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1951162034 ## native/hdfs/Cargo.toml: ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951159851 ## datafusion/sqllogictest/test_files/aggregate_skip_partial.slt: ## @@ -228,6 +231,27 @@ CREATE TABLE aggregate_test_100_null ( c11 FLOAT ); +statement

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951159851 ## datafusion/sqllogictest/test_files/aggregate_skip_partial.slt: ## @@ -228,6 +231,27 @@ CREATE TABLE aggregate_test_100_null ( c11 FLOAT ); +statement

Re: [PR] feat: [wip] experimental fuzz testing in test suite [datafusion-comet]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #1374: URL: https://github.com/apache/datafusion-comet/pull/1374#discussion_r1951169353 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
comphead commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1951171393 ## native/core/Cargo.toml: ## @@ -78,6 +78,7 @@ object_store = { workspace = true } url = { workspace = true } chrono = { workspace = true } parking_lot =

Re: [PR] feat: [wip] experimental fuzz testing in test suite [datafusion-comet]

2025-02-11 Thread via GitHub
andygrove commented on code in PR #1374: URL: https://github.com/apache/datafusion-comet/pull/1374#discussion_r1951169966 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -116,12 +116,49 @@ abstract class CometTestBase require(absTol > 0 && absTol <=

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-11 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2651328950 > Can we have a README that covers any changes we have made (if any), and what versions of hdfs client are supported (and on what platforms)? Also, if we have made changes, can

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951115943 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-11 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1951117633 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

[PR] chore: Use `MIN_DECIMAL128_FOR_EACH_PRECISION` and `MAX_DECIMAL128_FOR_EACH_PRECISION` [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 opened a new pull request, #14603: URL: https://github.com/apache/datafusion/pull/14603 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Add documentation about releasing to conda-forge [datafusion-python]

2025-02-11 Thread via GitHub
timsaucer closed issue #142: Add documentation about releasing to conda-forge URL: https://github.com/apache/datafusion-python/issues/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add documentation about releasing to conda-forge [datafusion-python]

2025-02-11 Thread via GitHub
timsaucer commented on issue #142: URL: https://github.com/apache/datafusion-python/issues/142#issuecomment-2650563180 I believe this is no longer needed, now that we have the conda feedstock that automatically pulls from pypi. Please reopen if you feel this is still needed. -- This is

Re: [I] Configure `statistics_truncate_length` in Parquet writer [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14601: URL: https://github.com/apache/datafusion/issues/14601#issuecomment-2650667459 Thanks @patchwork01 -- I agree it sounds like a good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14592: URL: https://github.com/apache/datafusion/pull/14592#issuecomment-2650548346 > I recommend we disable that test until someone is able to spend more time looking into why it is using so much disk space I will do so -- This is an automated message from t

Re: [PR] Disable extended tests (`extended_tests`) that are failing on runner [datafusion]

2025-02-11 Thread via GitHub
alamb commented on code in PR #14604: URL: https://github.com/apache/datafusion/pull/14604#discussion_r1950709829 ## .github/workflows/extended.yml: ## @@ -52,28 +52,30 @@ jobs: cargo check --profile ci --all-targets cargo clean - # Run extended tests (w

Re: [PR] chore(deps): bump substrait from 0.53.0 to 0.53.1 [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14599: URL: https://github.com/apache/datafusion/pull/14599#issuecomment-2650590819 Thank you for the review @mbrobbel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] refactor: Move FileSinkConfig out of Core [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14585: URL: https://github.com/apache/datafusion/pull/14585#issuecomment-2650610088 I merged this PR up from main to resolve some conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] refactor: Move FileSinkConfig out of Core [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14585: URL: https://github.com/apache/datafusion/pull/14585#issuecomment-2650677458 Thanks again @logan-keede -- let's keep the train moving! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-11 Thread via GitHub
hayman42 commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2650539273 @kazuyukitanimura I am not sure but I think the slowness comes from CometExchange that is executed after the join with BuildLeft this is for original Comet ``

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2650539265 > Hey @alamb , btw we probably need to remove the installation of `cmake` in the `.github/actions/setup-builder/action.yaml` since it was added for `snmalloc-rs = "0.3"` which we don't

[PR] Fix typo [datafusion]

2025-02-11 Thread via GitHub
byte-sourcerer opened a new pull request, #14605: URL: https://github.com/apache/datafusion/pull/14605 ## What changes are included in this PR? Fix typo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] chore(deps): bump substrait from 0.53.0 to 0.53.1 [datafusion]

2025-02-11 Thread via GitHub
alamb merged PR #14599: URL: https://github.com/apache/datafusion/pull/14599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2651028880 Can you please explain why the following is a problem? I am sorry it is not obvious to me (I am not very knowledgeable about these technologies) > A cache control header is

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-02-11 Thread via GitHub
alamb commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2651062245 Thanks @lmwnshn -- the Java implementation might be easier to transliterate to Rust... Also BTW I am pretty sure other rust data projects would be interested in a Rust imp

Re: [PR] chore(deps): bump sqllogictest from 0.26.4 to 0.27.0 [datafusion]

2025-02-11 Thread via GitHub
alamb commented on code in PR #14598: URL: https://github.com/apache/datafusion/pull/14598#discussion_r1951402064 ## datafusion/sqllogictest/src/engines/postgres_engine/mod.rs: ## @@ -300,6 +305,15 @@ impl sqllogictest::AsyncDB for Postgres { fn engine_name(&self) -> &str {

Re: [PR] Minor: remove some unnecessary dependencies [datafusion]

2025-02-11 Thread via GitHub
alamb merged PR #14615: URL: https://github.com/apache/datafusion/pull/14615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652036656 A lot of `TreeNodeRecursion::visit_sibling`... may be related to https://github.com/apache/datafusion/issues/13748 ? ![Image](https://github.com/user-attachments/ass

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-11 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2652041558 > I wonder why tpch_mem_sf10 is slower for some queries? Might it be possible the created memtable is not created evenly because of the new round robin (that might be fixable e

[PR] WIP: Add LogicalScalar [datafusion]

2025-02-11 Thread via GitHub
tobixdev opened a new pull request, #14617: URL: https://github.com/apache/datafusion/pull/14617 ## Which issue does this PR close? This change is related to #12622. While the PR is not yet complete, I won't be able to work on it for at least a few days. Therefore, I'd like to

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-11 Thread via GitHub
logan-keede commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2652024062 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-11 Thread via GitHub
rkrishn7 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1951574316 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Th

Re: [PR] WIP: Add LogicalScalar [datafusion]

2025-02-11 Thread via GitHub
tobixdev commented on code in PR #14617: URL: https://github.com/apache/datafusion/pull/14617#discussion_r1951588187 ## datafusion/sql/src/unparser/expr.rs: ## @@ -1831,15 +1625,15 @@ mod tests { ), ( Expr::Literal(ScalarValue::Date64(S

Re: [PR] Require all zero argument UDFs use `Signature::Nullary`, improve error messages [datafusion]

2025-02-11 Thread via GitHub
alamb commented on PR #13871: URL: https://github.com/apache/datafusion/pull/13871#issuecomment-2652009530 I don't have time to finish this work now. If anyone else wants to try and make progress, feel free -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Speedup `date_trunc` (~20% time reduction) [datafusion]

2025-02-11 Thread via GitHub
simonvandel commented on code in PR #14593: URL: https://github.com/apache/datafusion/pull/14593#discussion_r1951592139 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,10 +185,10 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let par

Re: [PR] WIP: Add LogicalScalar [datafusion]

2025-02-11 Thread via GitHub
tobixdev commented on code in PR #14617: URL: https://github.com/apache/datafusion/pull/14617#discussion_r1951589583 ## datafusion/common/src/types/builtin.rs: ## @@ -47,3 +50,53 @@ singleton!(LOGICAL_FLOAT64, logical_float64, Float64); singleton!(LOGICAL_DATE, logical_date, Da

Re: [PR] minor: check size overflow before string repeat build [datafusion]

2025-02-11 Thread via GitHub
comphead merged PR #14575: URL: https://github.com/apache/datafusion/pull/14575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652057779 Stacktrace also may also help https://github.com/user-attachments/assets/83ea287f-5312-4624-bc70-3824fb55c203"; /> -- This is an automated message from the Ap

Re: [PR] minor: check size overflow before string repeat build [datafusion]

2025-02-11 Thread via GitHub
comphead commented on PR #14575: URL: https://github.com/apache/datafusion/pull/14575#issuecomment-2652071549 Thanks @wForget and @alamb for finding a bench -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Concat error while testing "array_repeat" [datafusion-comet]

2025-02-11 Thread via GitHub
kazuyukitanimura commented on issue #1347: URL: https://github.com/apache/datafusion-comet/issues/1347#issuecomment-2652086731 cc @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652180854 Okay, so I think the issue is that with every `.with_column_renamed` / `.with_column` we add a new projection - that creates a lot of layers and each time adding a new one is m

Re: [PR] enable full decimal to decimal support [datafusion-comet]

2025-02-11 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1951694041 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -872,6 +872,13 @@ fn cast_array( let array = array_with_timezone(array, cast_options.timezone.

Re: [PR] enable full decimal to decimal support [datafusion-comet]

2025-02-11 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1951694041 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -872,6 +872,13 @@ fn cast_array( let array = array_with_timezone(array, cast_options.timezone.

Re: [PR] Introducing mutation testing [datafusion]

2025-02-11 Thread via GitHub
comphead commented on PR #14590: URL: https://github.com/apache/datafusion/pull/14590#issuecomment-2652206326 Thanks @edmondop I cannot see this flow in the list of PR checks, how long it takes? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] enable full decimal to decimal support [datafusion-comet]

2025-02-11 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1951694041 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -872,6 +872,13 @@ fn cast_array( let array = array_with_timezone(array, cast_options.timezone.

[PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-11 Thread via GitHub
benrsatori opened a new pull request, #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723 Add support for PostgreSQL and Redshift geometric operators. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] refactor: remove uses of arrow_schema and use reexport in arrow instead [datafusion]

2025-02-11 Thread via GitHub
Chen-Yuan-Lai commented on PR #14597: URL: https://github.com/apache/datafusion/pull/14597#issuecomment-2650178191 Due to some uses of `record_batch!` macro, I kept `arrow-schema` dependency in physical-plan and datafusion-examples -- This is an automated message from the Apache Git

[I] Configure `statistics_truncate_length` in Parquet writer [datafusion]

2025-02-11 Thread via GitHub
patchwork01 opened a new issue, #14601: URL: https://github.com/apache/datafusion/issues/14601 ### Is your feature request related to a problem or challenge? DataFusion has deprecated the configuration option `datafusion.execution.parquet.max_statistics_size`, because it's not used:

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950526095 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950529828 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950529828 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950526095 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950529828 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-11 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950526095 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-11 Thread via GitHub
kazuyukitanimura commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2650118134 We have not enabled SHJ as default because we haven't implemented spilling IIRC. Regardless, thank you for reporting this @hayman42 Did you have a chance to div

  1   2   3   >