Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 closed issue #10838: Convert `approx_median` to UDAF URL: https://github.com/apache/datafusion/issues/10838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 merged PR #10840: URL: https://github.com/apache/datafusion/pull/10840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on PR #10840: URL: https://github.com/apache/datafusion/pull/10840#issuecomment-2157455761 Thanks @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Convert approx_distinct to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on code in PR #10851: URL: https://github.com/apache/datafusion/pull/10851#discussion_r1632615003 ## datafusion/physical-expr/src/aggregate/hyperloglog.rs: ## @@ -115,6 +115,7 @@ where /// Get the register histogram (each value in register index into

Re: [PR] Minor: Clarify `SessionContext::state` docs [datafusion]

2024-06-09 Thread via GitHub
tshauck commented on code in PR #10847: URL: https://github.com/apache/datafusion/pull/10847#discussion_r1632596084 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1249,8 +1249,18 @@ impl SessionContext { Arc::new(TaskContext::from(self)) } -/// Snaps

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632564227 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -766,10 +769,108 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UIn

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632556874 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -766,10 +769,108 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UIn

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
abhiaagarwal commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2157141004 All good, just wanted to bring it to your attention before 39 formally drops :) -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Implement `ScalarValue::TimestampMillisecond` -> String Support [datafusion]

2024-06-09 Thread via GitHub
pingsutw commented on issue #10797: URL: https://github.com/apache/datafusion/issues/10797#issuecomment-2157134523 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Convert approx_distinct to UDAF [datafusion]

2024-06-09 Thread via GitHub
Lordworms opened a new pull request, #10851: URL: https://github.com/apache/datafusion/pull/10851 ## Which issue does this PR close? Closes #10837 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] Convert `VariancePopulation` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 merged PR #10836: URL: https://github.com/apache/datafusion/pull/10836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Convert Variance Population to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 closed issue #10668: Convert Variance Population to UDAF URL: https://github.com/apache/datafusion/issues/10668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Convert `VariancePopulation` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on PR #10836: URL: https://github.com/apache/datafusion/pull/10836#issuecomment-2157080515 Thanks @mknaw and @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632502092 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632500900 ## datafusion/physical-expr-common/Cargo.toml: ## @@ -37,6 +37,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-array = { workspac

[PR] Minor: add `Window::try_new_with_schema` constructor [datafusion]

2024-06-09 Thread via GitHub
sadboy opened a new pull request, #10850: URL: https://github.com/apache/datafusion/pull/10850 ## Which issue does this PR close? Closes N/A ## Rationale for this change This change makes the `Window` node API more uniform with other `LogicalPlan` nodes,

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
erratic-pattern commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632477484 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
erratic-pattern commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632476923 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,420 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[PR] feat: Add method to add analyzer rules to SessionContext [datafusion]

2024-06-09 Thread via GitHub
pingsutw opened a new pull request, #10849: URL: https://github.com/apache/datafusion/pull/10849 ## Which issue does this PR close? Closes #10846 ## Rationale for this change The new session context works fine with the rules I added, but there is no way to register the rules to

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
waynexia commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2156967694 Sorry for causing this! Opened https://github.com/apache/datafusion/pull/10848 to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] MINOR: use workspace deps in proto-common [datafusion]

2024-06-09 Thread via GitHub
waynexia opened a new pull request, #10848: URL: https://github.com/apache/datafusion/pull/10848 ## Which issue does this PR close? Resolve https://github.com/apache/datafusion/pull/10765#issuecomment-2156823540 ## Rationale for this change Some deps in `

Re: [I] API to add analyzer_rules [datafusion]

2024-06-09 Thread via GitHub
pingsutw commented on issue #10846: URL: https://github.com/apache/datafusion/issues/10846#issuecomment-2156924635 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632437906 ## datafusion/physical-expr-common/Cargo.toml: ## @@ -37,6 +37,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-array = { workspa

[PR] Minor: Clarify `SessionContext::state` docs [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new pull request, #10847: URL: https://github.com/apache/datafusion/pull/10847 ## Which issue does this PR close? N/A ## Rationale for this change @goldmedal reports on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1249052209388

[I] API to add analyzer_rules [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new issue, #10846: URL: https://github.com/apache/datafusion/issues/10846 ### Is your feature request related to a problem or challenge? @goldmedal reports on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1249052209388326922 Hi, I

[I] [Proposal] DataFusion monthly online meetup [datafusion]

2024-06-09 Thread via GitHub
edmondop opened a new issue, #10845: URL: https://github.com/apache/datafusion/issues/10845 ### Is your feature request related to a problem or challenge? DataFusion is such an exciting project, with so many great contributors and so many users that are building useful systems on top

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
abhiaagarwal commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2156823540 @waynexia I was inspecting the dependency tree for delta-rs and it seems the dependency didn't get bumped for object-store in `proto-common` fyi, so 0.9.1 gets pulled. ``

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156816494 Thank you so much - I quickly skimmed the draft you uploaded (will take a closer look tomorrow). However, the main question should be answered, for now we are iterating o

[PR] docs(variance): Correct typos in comments [datafusion]

2024-06-09 Thread via GitHub
pingsutw opened a new pull request, #10844: URL: https://github.com/apache/datafusion/pull/10844 ## Which issue does this PR close? Closes #. ## Rationale for this change Typo ## What changes are included in this PR? Correct typos in comments ## Are t

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on code in PR #10842: URL: https://github.com/apache/datafusion/pull/10842#discussion_r1632401139 ## datafusion/substrait/tests/testdata/tpch/lineitem.csv: ## @@ -0,0 +1,2 @@ +l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
richtia commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156807602 > > Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for yo

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156804973 > Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for yo

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632396117 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1724,7 +1808,7 @@ async fn test_boolean() { .build() .await; -Test { +TestBoth { Re

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156798952 @marvinlanhenke -- I whipped up something (actually I had been playing with it yesterday) https://github.com/apache/datafusion/pull/10843 -- This is an automated message from t

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632395830 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -766,10 +769,108 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UInt64Array:

[PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new pull request, #10843: URL: https://github.com/apache/datafusion/pull/10843 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10806 ## Rationale for this change @marvinlanhenke asked some excellent questions on https://

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on PR #10842: URL: https://github.com/apache/datafusion/pull/10842#issuecomment-2156797223 another problem here is that the json file seems to do projection after aggregation https://github.com/apache/datafusion/assets/48054792/42213070-9f6d-4293-a0c1-5e3f54312391";>

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156787656 Thank you @marvinlanhenke -- excellent analysis. > * a lot of work the StatisticConverter does is already done [here](https://github.com/apache/datafusion/blob/main/datafus

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on code in PR #10842: URL: https://github.com/apache/datafusion/pull/10842#discussion_r1632389768 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1253,6 +1335,8 @@ fn from_substrait_type( r#type::Kind::Struct(s) => Ok(DataType::Stru

Re: [I] Convert `approx_distinct` to UDAF [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10837: URL: https://github.com/apache/datafusion/issues/10837#issuecomment-2156781960 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms opened a new pull request, #10842: URL: https://github.com/apache/datafusion/pull/10842 ## Which issue does this PR close? part of #10710 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
richtia commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156781036 Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156779263 > actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot. Wow -- nice work! -- This is an automated message from the

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156779183 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Can't serialize example `ExecutionPlan` to substrait [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9299: URL: https://github.com/apache/datafusion/issues/9299#issuecomment-2156779111 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Complete substrait support for ParquetExec round trip [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9347: URL: https://github.com/apache/datafusion/issues/9347#issuecomment-2156779039 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Substrait serializer clippy error: not calling `truncate` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9727: URL: https://github.com/apache/datafusion/issues/9727#issuecomment-2156779002 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Substrait consumer doesn't respect final output column names [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10817: URL: https://github.com/apache/datafusion/issues/10817#issuecomment-2156778930 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156778662 @alamb I'm currently thinking about how to integrate `StatisticsConverter` with the existing code `prune_pages_in_one_row_group`. This is what I originally had i

Re: [I] Projects require unique expressions names error in substrait producer/consumer [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10815: URL: https://github.com/apache/datafusion/issues/10815#issuecomment-2156778917 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156778859 > Thank you @Lordworms -- indeed it is a great job to find issues when testing > > I wonder if there is some way to get the integration test partly working and checked

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156778295 Thank you @Lordworms -- indeed it is a great job to find issues when testing I wonder if there is some way to get the integration test partly working and checked in somewhe

Re: [PR] Add support for reading CSV files with comments [datafusion]

2024-06-09 Thread via GitHub
bbannier commented on PR #10467: URL: https://github.com/apache/datafusion/pull/10467#issuecomment-2156768164 This is now rebased on `main` which recently bumped arrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on PR #10813: URL: https://github.com/apache/datafusion/pull/10813#issuecomment-2156767428 BTW I am happy to make additional corrections as follow on PRs if anyone has additional notes cc @advancedxy @thinkharderdev @crepererum @NGA-TRAN and @Ted-Jiang @xinlifoobar

Re: [I] API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) [datafusion]

2024-06-09 Thread via GitHub
alamb closed issue #9929: API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) URL: https://github.com/apache/datafusion/issues/9929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb merged PR #10813: URL: https://github.com/apache/datafusion/pull/10813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: null character not permitted in chr function [datafusion-comet]

2024-06-09 Thread via GitHub
codecov-commenter commented on PR #513: URL: https://github.com/apache/datafusion-comet/pull/513#issuecomment-2156709695 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/513?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally) [datafusion]

2024-06-09 Thread via GitHub
edmondop commented on issue #4850: URL: https://github.com/apache/datafusion/issues/4850#issuecomment-2156694846 I wanted to confirm I understood correctly what options we are picking. It seems to me the following are viable: 1. Support "implicit" tables via select from (moving the change

Re: [I] Add `read_parquet` SQL UDF [datafusion]

2024-06-09 Thread via GitHub
edmondop commented on issue #3773: URL: https://github.com/apache/datafusion/issues/3773#issuecomment-2156689753 In Databricks SQL dialect, they support ```sql SELECT * from json.`s3:/foo/bar/table/ ``` -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10813: URL: https://github.com/apache/datafusion/pull/10813#discussion_r1632342300 ## datafusion/core/src/datasource/physical_plan/parquet/opener.rs: ## @@ -212,3 +213,34 @@ impl FileOpener for ParquetOpener { })) } } + +/// Return t

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb merged PR #10841: URL: https://github.com/apache/datafusion/pull/10841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb commented on PR #10841: URL: https://github.com/apache/datafusion/pull/10841#issuecomment-2156671355 Let's merge this and iterate on the main branch. Thanks again @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632322840 ## core/src/execution/sort.rs: ## @@ -159,12 +159,14 @@ where pos += 1; } } else { -unsafe {

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632321812 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -78,8 +78,12 @@ pub(crate) fn spark_compatible_murmur3_hash>(data: T, seed: u32) } h1

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632321751 ## core/src/parquet/util/hash_util.rs: ## @@ -45,17 +47,19 @@ const MURMUR_R: i32 = 47; unsafe fn murmur_hash2_64a(data_bytes: &[u8], seed: u64) -> u64 {

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632319699 ## core/src/parquet/util/hash_util.rs: ## @@ -106,16 +108,12 @@ unsafe fn crc32_hash(bytes: &[u8], seed: u32) -> u32 { let num_words = num_bytes / u32_num

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632311422 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing A

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10841: URL: https://github.com/apache/datafusion/pull/10841#discussion_r1632309893 ## dev/changelog/39.0.0.md: ## @@ -1,23 +1,25 @@ - -## [39.0.0](https://github.com/apache/datafusion/tree/39.0.0) (2024-06-07) +# Apache DataFusion 39.0.0 Changelo

[PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove opened a new pull request, #546: URL: https://github.com/apache/datafusion-comet/pull/546 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/507 ## Rationale for this change Rust 1.78 has better debug asse

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632307253 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing Alias

Re: [I] chore: Make Comet compatible with Rust 1.78 [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on issue #507: URL: https://github.com/apache/datafusion-comet/issues/507#issuecomment-2156614044 These are the failing tests: ``` test execution::datafusion::spark_hash::tests::test_str ... thread caused non-unwinding panic. aborting. test execution::sort::

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632296297 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS L

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632298786 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing A

Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on issue #10838: URL: https://github.com/apache/datafusion/issues/10838#issuecomment-2156610120 > We can add `self.args` to `AccumulatorArgs` and compute `percentile` and `tdigest max size` in `accumulator`. > Thanks for the suggestion. I can make a PR after #10

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632296297 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS L

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632295422 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS LAST,

[PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
andygrove opened a new pull request, #10841: URL: https://github.com/apache/datafusion/pull/10841 ## Which issue does this PR close? N/A ## Rationale for this change - Improve the quality of generated changelogs - Reduce manual effort required when gene

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632290625 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632289500 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing Alias

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632287838 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -962,6 +980,26 @@ mod test { Ok(()) } +#[test] Review Comment: Thanks for

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632287690 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -801,15 +814,19 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { let expr_

Re: [PR] Stop copying LogicalPlan and Exprs in `CommonSubexprEliminate` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10835: URL: https://github.com/apache/datafusion/pull/10835#discussion_r1632235419 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -184,90 +189,138 @@ impl CommonSubexprEliminate { Ok((rewrite_exprs, new_input)) } -

Re: [I] Feedback request for providing configurable UDF functions [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10744: URL: https://github.com/apache/datafusion/issues/10744#issuecomment-2156497086 We just merged the aggregate builder in https://github.com/apache/datafusion/pull/10560 -- I am quite happy with how it turned out, in case you want to take a friendly look --

Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on issue #10838: URL: https://github.com/apache/datafusion/issues/10838#issuecomment-2156459652 > Hi @jayzhan211 I tried to convert `ApproxPercentileCont`, but I encountered some challenges. `ApproxPercentileCont` creates the accumulator based on the `percentile` parame

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632243037 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] fix: Support double quotes in `date_part` [datafusion]

2024-06-09 Thread via GitHub
Weijun-H commented on code in PR #10833: URL: https://github.com/apache/datafusion/pull/10833#discussion_r1632239784 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -127,7 +127,12 @@ impl ScalarUDFImpl for DatePartFunc { ColumnarValue::Scalar(scalar) => sca

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632238045 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632237470 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632237470 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632234343 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] fix: Support double quotes in `date_part` [datafusion]

2024-06-09 Thread via GitHub
Jefffrey commented on code in PR #10833: URL: https://github.com/apache/datafusion/pull/10833#discussion_r1632234019 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -127,7 +127,12 @@ impl ScalarUDFImpl for DatePartFunc { ColumnarValue::Scalar(scalar) => sca

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156435325 @alamb ...just to confirm my current understanding. Since we can have more than one page per column per row group, I can get multiple statistics. Or put differe

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

  1   2   >