Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2024-11-22 Thread via GitHub
findepi commented on code in PR #13517: URL: https://github.com/apache/datafusion/pull/13517#discussion_r1853489879 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -633,4 +1160,87 @@ mod test { assert_eq!(drop_view.partial_cmp(&catalog), Some(Ordering::Greater));

Re: [PR] Add `ScalarUDFImpl::invoke_with_args` to support passing the return type created for the udf instance [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on PR #13290: URL: https://github.com/apache/datafusion/pull/13290#issuecomment-2493483002 In https://github.com/apache/datafusion/pull/13491 I have migrated over to using invoke_batch in all test. I think if we are quick we can rename `invoke_with_args` back to `inv

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
alamb commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853770380 ## datafusion/expr/src/udf.rs: ## @@ -546,7 +546,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// to arrays, which will likely be simpler code, but be sl

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-11-22 Thread via GitHub
crepererum commented on code in PR #13524: URL: https://github.com/apache/datafusion/pull/13524#discussion_r1853764821 ## datafusion/physical-plan/src/aggregates/group_values/row.rs: ## @@ -216,18 +216,18 @@ impl GroupValues for GroupValuesRows { }

Re: [PR] Add `ScalarUDFImpl::invoke_with_args` to support passing the return type created for the udf instance [datafusion]

2024-11-22 Thread via GitHub
alamb commented on PR #13290: URL: https://github.com/apache/datafusion/pull/13290#issuecomment-2493538976 > In #13491 I have migrated over to using invoke_batch in all test. I think if we are quick we can rename `invoke_with_args` back to `invoke`. @alamb @findepi The more I think a

[PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-11-22 Thread via GitHub
crepererum opened a new pull request, #13524: URL: https://github.com/apache/datafusion/pull/13524 ## Which issue does this PR close? For #13433, but only parts of it. ## Rationale for this change Prepare `hashbrown` 0.15 upgrade. ## What changes are included in this PR?

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853763755 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1954,32 +1954,6 @@ The following intervals are supported: - years - century - Example Revi

Re: [PR] feat(function): add greatest function [datafusion]

2024-11-22 Thread via GitHub
rluvaton commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2493197545 Thank you @waynexia see you on Monday at your CMU talk, I'll join 10 minutes before so we can talk about data fusion 😊 -- This is an automated message from the Apache Git Service.

Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2024-11-22 Thread via GitHub
findepi commented on code in PR #13517: URL: https://github.com/apache/datafusion/pull/13517#discussion_r1853498966 ## datafusion/common/src/error.rs: ## @@ -150,6 +150,11 @@ pub enum SchemaError { qualifier: Box, name: String, }, +/// Schema duplicate

Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2024-11-22 Thread via GitHub
findepi commented on code in PR #13517: URL: https://github.com/apache/datafusion/pull/13517#discussion_r1853493794 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -288,8 +290,234 @@ impl PartialOrd for CreateExternalTable { } } +impl CreateExternalTable { +pub fn

[PR] Subquery add unnest [datafusion]

2024-11-22 Thread via GitHub
kosiew opened a new pull request, #13523: URL: https://github.com/apache/datafusion/pull/13523 ## Which issue does this PR close? Closes #13498. ## Rationale for this change Subquery's check_inner_plan support LogicalPlan::Unnest ## What changes are

Re: [I] Subquery check_internal_plan does not support LogicalPlan::Unnest [datafusion]

2024-11-22 Thread via GitHub
kosiew commented on issue #13498: URL: https://github.com/apache/datafusion/issues/13498#issuecomment-2493205798 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Implement GroupColumn Decimal128Array [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #13505: URL: https://github.com/apache/datafusion/issues/13505#issuecomment-2493611071 > @alamb For this pr, will it need its own custom column implementation for decimal128 instead of instantiate_primitive!, similar to how byte, byteview, stringview, etc. are dealt

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-11-22 Thread via GitHub
crepererum commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2493648473 The test failure (full logs [here](https://productionresultssa18.blob.core.windows.net/actions-results/32eefbe2-6c26-46f7-acbc-650b6992f146/workflow-job-run-44face5e-4cc4-54f4-de84

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853899040 ## datafusion/functions/src/utils.rs: ## @@ -170,7 +171,8 @@ pub mod test { } else { // in

Re: [PR] feat: enable decimal to decimal cast of different precision and scale [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove merged PR #1086: URL: https://github.com/apache/datafusion-comet/pull/1086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] [DISCUSSION] Make it easy and fast to query files on remote files (S3, iceberg, etc) [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on issue #13456: URL: https://github.com/apache/datafusion/issues/13456#issuecomment-2493825646 @alamb from a datafusion perspective which parts do you think are missing? I ask about just the datafusion perspective because i am assuming the owners of the relevant ta

Re: [PR] feat: support array_insert [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove commented on code in PR #1073: URL: https://github.com/apache/datafusion-comet/pull/1073#discussion_r1853910995 ## native/spark-expr/src/list.rs: ## @@ -413,14 +426,297 @@ impl PartialEq for GetArrayStructFields { } } +#[derive(Debug, Hash)] +pub struct ArrayIn

Re: [PR] feat: support array_insert [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove merged PR #1073: URL: https://github.com/apache/datafusion-comet/pull/1073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat(function): add greatest function [datafusion]

2024-11-22 Thread via GitHub
rluvaton commented on code in PR #12474: URL: https://github.com/apache/datafusion/pull/12474#discussion_r1853941000 ## datafusion/functions/src/core/greatest.rs: ## @@ -0,0 +1,272 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[I] [DISCUSSION] Making it easier (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
alamb opened a new issue, #13525: URL: https://github.com/apache/datafusion/issues/13525 ### Is your feature request related to a problem or challenge? I recently watched the [Biting the Bullet: Rebuilding GlareDB from the Ground Up (Sean Smith)](https://www.youtube.com/watch?v=Sor3K

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
scsmithr commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2494375757 # Choosing DataFusion We chose to use DataFusion for the base of our product ~June 2022 and have kept up with it until earlier this year. When deciding, we looked

Re: [I] Unparse Map plan to SQL string [datafusion]

2024-11-22 Thread via GitHub
delamarch3 commented on issue #13478: URL: https://github.com/apache/datafusion/issues/13478#issuecomment-2494339342 I implemented it for `map` in #13532. I'm unsure about `get_field` because I'm not able to tell if it's a map column from the `Expr` -- This is an automated message from th

Re: [PR] feat: Optimize `SortPreservingMergeExec` to avoid merging non-overlapping partitions [datafusion]

2024-11-22 Thread via GitHub
2010YOUY01 commented on PR #13296: URL: https://github.com/apache/datafusion/pull/13296#issuecomment-2493719309 The implementation is really nice. I'm wondering is it convenient to move the stream concat logic into `StreamingMergeBuilder`, like ```rust let result = StreamingMergeBu

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2494002084 @scsmithr In your follow up I think it would be useful to know if there is anything the DataFusion project could do better that would have made you more likely to contrib

Re: [I] [DISCUSSION] Make it easy and fast to query files on remote files (S3, iceberg, etc) [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on issue #13456: URL: https://github.com/apache/datafusion/issues/13456#issuecomment-2494393992 The way i was thinking about it the thing that makes it interesting / difficult is getting the semantics of each of the formats as part of datafusion which i would presum

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
scsmithr commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2494399003 > @scsmithr In your follow up I think it would be useful to know if there is anything the DataFusion project could do better that would have made you more likely to contribute

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
alamb commented on code in PR #13424: URL: https://github.com/apache/datafusion/pull/13424#discussion_r1854397735 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,206 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
alamb commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494506254 Ok, I am pretty happy with where this PR is now. It shows the entire process running end to end, with the `DedicatedExecutor` and running always on the dedicated executor Things

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-11-22 Thread via GitHub
Dandandan commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2494524212 > The test failure (full logs [here](https://productionresultssa18.blob.core.windows.net/actions-results/32eefbe2-6c26-46f7-acbc-650b6992f146/workflow-job-run-44face5e-4cc4-54f4-de8

Re: [PR] Unparse map to sql [datafusion]

2024-11-22 Thread via GitHub
alamb commented on code in PR #13532: URL: https://github.com/apache/datafusion/pull/13532#discussion_r1854484463 ## datafusion/sql/src/unparser/expr.rs: ## @@ -567,6 +568,43 @@ impl Unparser<'_> { Ok(ast::Expr::CompoundIdentifier(id)) } +fn map_to_sql(&self,

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494530521 > Ok, I am pretty happy with where this PR is now. It shows the entire process running end to end, with the `DedicatedExecutor` and running always on the dedicated executor

Re: [PR] Make easier to create custom schedulers and executors [datafusion-ballista]

2024-11-22 Thread via GitHub
andygrove merged PR #1118: URL: https://github.com/apache/datafusion-ballista/pull/1118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Unparse map to sql [datafusion]

2024-11-22 Thread via GitHub
delamarch3 commented on PR #13532: URL: https://github.com/apache/datafusion/pull/13532#issuecomment-2494608235 Thanks for the review @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] Fix panic when hashing empty FixedSizeList Array [datafusion]

2024-11-22 Thread via GitHub
findepi opened a new pull request, #13533: URL: https://github.com/apache/datafusion/pull/13533 Previously it would panic due to division by zero. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494661569 I had the impression that this example was for illustration purposes for what it would look like to have fully separate io and cpu runtimes - although not the desired end stat

Re: [I] Implement GroupColumn Decimal128Array [datafusion]

2024-11-22 Thread via GitHub
jonathanc-n commented on issue #13505: URL: https://github.com/apache/datafusion/issues/13505#issuecomment-2494696505 Yep, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
alamb commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494584685 > Can you expand on point 1? My naive expectation was that all network io went through the main runtime. Yes, that is what should happen. The problem is here. As written, calling

Re: [PR] Allow example CLI to read from stdin [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
alamb merged PR #1536: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494628563 @alamb perfect makes sense. I just wasn't originally clear what you meant by running it on the dedicated executor. -- This is an automated message from the Apache Git Servi

Re: [PR] refactor: Move BallistaRegistry to better location [datafusion-ballista]

2024-11-22 Thread via GitHub
andygrove merged PR #1126: URL: https://github.com/apache/datafusion-ballista/pull/1126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
tustvold commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2494641789 At the risk of repeating myself from https://github.com/datafusion-contrib/datafusion-dft/pull/248#issuecomment-2489110287 I would strongly discourage overloading the ObjectStore tr

Re: [PR] fix: [WIP] Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
mbutrovich commented on PR #: URL: https://github.com/apache/datafusion-comet/pull/#issuecomment-2494641304 My initial thoughts: - Approach makes sense. This is good infrastructure to have for managing when Spark plans and native plans diverge in node count and structure. - I

Re: [PR] Add an example of `invoke_batch_with_return_type` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs closed pull request #13289: Add an example of `invoke_batch_with_return_type` URL: https://github.com/apache/datafusion/pull/13289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] recursive select calls are parsed with bad trailing_commas parameter [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
tomershaniii commented on code in PR #1521: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1521#discussion_r1854655104 ## test.sql: ## @@ -0,0 +1,229 @@ +WITH EventData AS ( Review Comment: Fixed, sorry about that -- This is an automated message from th

Re: [PR] fix: [WIP] Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove commented on PR #: URL: https://github.com/apache/datafusion-comet/pull/#issuecomment-2494835653 Some progress! ## Before ![Screenshot from 2024-11-22 11-44-21](https://github.com/user-attachments/assets/81098376-c546-4966-80aa-10ae679de270) ## After

Re: [I] [DISCUSSION] More SqlLogicTest test coverage for queries, including join queries [datafusion]

2024-11-22 Thread via GitHub
Dandandan commented on issue #13470: URL: https://github.com/apache/datafusion/issues/13470#issuecomment-2494164562 One way to extend join coverage is to have some more "answer checking" for different benchmarks, such as: https://github.com/apache/datafusion/issues/13073 (Earli

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853902300 ## datafusion/expr/src/udf.rs: ## @@ -213,7 +213,6 @@ impl ScalarUDF { self.inner.is_nullable(args, schema) } -#[deprecated(since = "43.0

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853902782 ## datafusion/functions/benches/ltrim.rs: ## @@ -141,8 +141,8 @@ fn run_with_string_type( ), |b| { b.iter(|| { -

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853905179 ## datafusion/functions/src/datetime/date_bin.rs: ## @@ -181,30 +185,6 @@ Calculates time intervals and returns the start of the interval nearest to the s

Re: [PR] Move many udf implementations from `invoke` to `invoke_batch` [datafusion]

2024-11-22 Thread via GitHub
joseph-isaacs commented on code in PR #13491: URL: https://github.com/apache/datafusion/pull/13491#discussion_r1853905787 ## datafusion/functions/src/utils.rs: ## @@ -170,7 +171,8 @@ pub mod test { } else { // in

Re: [I] [EPIC] Support LogicalPlan --> `SQL String` [datafusion]

2024-11-22 Thread via GitHub
alamb closed issue #8661: [EPIC] Support LogicalPlan --> `SQL String` URL: https://github.com/apache/datafusion/issues/8661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] [EPIC] Support converting Exprs and LogicalPlans --> SQL Strings [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #9494: URL: https://github.com/apache/datafusion/issues/9494#issuecomment-2493841934 Let's claim this is complete 🎉 All the existing subtasks are closed and now we are working on this in our ongoing operations -- This is an automated message from the Apache Git Se

Re: [I] [EPIC] Support converting Exprs and LogicalPlans --> SQL Strings [datafusion]

2024-11-22 Thread via GitHub
alamb closed issue #9494: [EPIC] Support converting Exprs and LogicalPlans --> SQL Strings URL: https://github.com/apache/datafusion/issues/9494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] add examples and description to scalar/aggregate functions? [datafusion]

2024-11-22 Thread via GitHub
alamb closed issue #8366: add examples and description to scalar/aggregate functions? URL: https://github.com/apache/datafusion/issues/8366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] add examples and description to scalar/aggregate functions? [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #8366: URL: https://github.com/apache/datafusion/issues/8366#issuecomment-2493843391 We have completed this work as part of https://github.com/apache/datafusion/issues/12432 (thanks @Omega359 @jonathanc-n and others) -- This is an automated message from the Apach

Re: [I] add examples and description to scalar/aggregate functions? [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #8366: URL: https://github.com/apache/datafusion/issues/8366#issuecomment-2493844479 For example https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.AggregateUDFImpl.html#method.documentation and you can set various fields like this: https:

Re: [PR] minor: allow dead code for Scenario::UTF8 [datafusion]

2024-11-22 Thread via GitHub
alamb commented on PR #13531: URL: https://github.com/apache/datafusion/pull/13531#issuecomment-2494517218 Thanks @zhuliquan - it would also be really nice to figure out how you are running the code/tests to see this error. Maybe there is some better way we can come up with to fix it --

Re: [PR] Add support for Utf8View to string_to_array and array_to_string [datafusion]

2024-11-22 Thread via GitHub
Omega359 commented on PR #13403: URL: https://github.com/apache/datafusion/pull/13403#issuecomment-2494565464 > I agree -- there is no guideline that I know of. Any chance you would be willing to propose one? I'll have to think about it tbh @alamb. Part of that is to know the cost to

Re: [PR] minor: allow dead code for Scenario::UTF8 [datafusion]

2024-11-22 Thread via GitHub
findepi commented on PR #13531: URL: https://github.com/apache/datafusion/pull/13531#issuecomment-2494767825 we should really avoid `#[allow(dead_code)]` for stuff used only sometimes, because we won't know when this can be removed/ This is used in `external_access_plan` which is dis

Re: [PR] fix: [WIP] Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove commented on PR #: URL: https://github.com/apache/datafusion-comet/pull/#issuecomment-2494989060 We now have metrics for all operators showing time for fetching batches from JVM. ![Screenshot from 2024-11-22 15-35-34](https://github.com/user-attachments/assets/fd8e

[PR] Set timezone for group column timestamp type [datafusion]

2024-11-22 Thread via GitHub
jayzhan211 opened a new pull request, #13535: URL: https://github.com/apache/datafusion/pull/13535 ## Which issue does this PR close? Closes #13534. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [I] [DISCUSSION] Make it easy and fast to query files on remote files (S3, iceberg, etc) [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #13456: URL: https://github.com/apache/datafusion/issues/13456#issuecomment-2494998002 > @alamb from a datafusion perspective which parts do you think are missing? I ask about just the datafusion perspective because i am assuming the owners of the relevant table for

Re: [PR] Minor: Exclude all DDL statements from Ray scheduling [datafusion-ray]

2024-11-22 Thread via GitHub
edmondop commented on PR #42: URL: https://github.com/apache/datafusion-ray/pull/42#issuecomment-2495145875 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] [DISCUSSION] Make it easy and fast to query files on remote files (S3, iceberg, etc) [datafusion]

2024-11-22 Thread via GitHub
matthewmturner commented on issue #13456: URL: https://github.com/apache/datafusion/issues/13456#issuecomment-2495169457 Thanks @alamb I plan to work on the second item - probably in December. I was really hoping to get a dft release out shortly where all the custom table providers

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
djanderson commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2495173120 > I appreciate this is a more intrusive approach, but I don't really think DataFusion can continue to leave this sort of thing as an exercise for the reader, especially given the

Re: [PR] fix: Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
parthchandra commented on PR #: URL: https://github.com/apache/datafusion-comet/pull/#issuecomment-2495076058 Approach looks good (though I cannot say I understand it completely). The results are definitely what we wanted! -- This is an automated message from the Apache Git Servi

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
tustvold commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2495131293 > but I don't know how to translate your suggestions into actual code The basic idea is rather than shoehorning the runtime dispatch into the ObjectStore trait, instead make t

Re: [I] Implement GroupColumn Decimal128Array [datafusion]

2024-11-22 Thread via GitHub
jayzhan211 commented on issue #13505: URL: https://github.com/apache/datafusion/issues/13505#issuecomment-2495206147 > > @alamb For this pr, will it need its own custom column implementation for decimal128 instead of instantiate_primitive!, similar to how byte, byteview, stringview, etc. ar

[I] Simple query fails with `column types must match schema types` [datafusion]

2024-11-22 Thread via GitHub
adriangb opened a new issue, #13534: URL: https://github.com/apache/datafusion/issues/13534 ### Describe the bug Found in our production system using datafusion internals, but reproducible in `datafusion-cli`: ```sql SELECT 'foo' AS text, arrow_cast('2024-01-01

Re: [I] Simple query fails with `column types must match schema types` [datafusion]

2024-11-22 Thread via GitHub
adriangb commented on issue #13534: URL: https://github.com/apache/datafusion/issues/13534#issuecomment-2495114051 cc @jonathanc-n @jayzhan211 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix: Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove commented on PR #: URL: https://github.com/apache/datafusion-comet/pull/#issuecomment-2495103006 I can possibly break this down into some smaller PRs as well. I may do that. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Support Databricks struct literal [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1542: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1542#discussion_r1855088307 ## src/parser/mod.rs: ## @@ -2328,19 +2327,25 @@ impl<'a> Parser<'a> { } } -/// Bigquery specific: Parse a struct literal /// S

Re: [PR] Support snowflake double dot notation for object name [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1540: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1540#discussion_r1855090361 ## tests/sqlparser_snowflake.rs: ## @@ -2846,3 +2846,32 @@ fn test_parse_show_columns_sql() { snowflake().verified_stmt("SHOW COLUMNS IN TABLE abc"

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-11-22 Thread via GitHub
alamb commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2495013307 > At the risk of repeating myself from [datafusion-contrib/datafusion-dft#248 (comment)](https://github.com/datafusion-contrib/datafusion-dft/pull/248#issuecomment-2489110287) I would

Re: [PR] Minor: Exclude all DDL statements from Ray scheduling [datafusion-ray]

2024-11-22 Thread via GitHub
edmondop commented on PR #42: URL: https://github.com/apache/datafusion-ray/pull/42#issuecomment-2495145678 What concerns me about this change is that we had a bug that we should have prevented in unit tests. I can try to add a unit tests for this part @ccciudatu and then maybe you can fix

Re: [PR] POC: Fusing repart and partial aggr [datafusion]

2024-11-22 Thread via GitHub
github-actions[bot] commented on PR #12526: URL: https://github.com/apache/datafusion/pull/12526#issuecomment-2495210019 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Enhance the nested type access for Generic and DuckDB dialect [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1541: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1541#discussion_r1855136738 ## src/parser/mod.rs: ## @@ -2935,12 +2935,23 @@ impl<'a> Parser<'a> { }) } else if Token::LBracket == tok { if diale

Re: [PR] Add support for data type specific methods [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1535: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1535#discussion_r1855132514 ## src/parser/mod.rs: ## @@ -1224,6 +1224,17 @@ impl<'a> Parser<'a> { body: Box::new(self.parse_expr()?),

Re: [PR] fix: Use RDD partition index [datafusion-comet]

2024-11-22 Thread via GitHub
codecov-commenter commented on PR #1112: URL: https://github.com/apache/datafusion-comet/pull/1112#issuecomment-2495379723 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1112?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: Support partition values in feature branch comet-parquet-exec [datafusion-comet]

2024-11-22 Thread via GitHub
viirya merged PR #1106: URL: https://github.com/apache/datafusion-comet/pull/1106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Support snowflake double dot notation for object name [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1540: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1540#discussion_r1855103049 ## src/parser/mod.rs: ## @@ -8349,6 +8349,13 @@ impl<'a> Parser<'a> { pub fn parse_object_name(&mut self, in_table_clause: bool) -> Result {

Re: [I] How to best add support for IDENTIFIER() clause [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on issue #1412: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1412#issuecomment-2495338309 Related PR https://github.com/apache/datafusion-sqlparser-rs/pull/1539 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Enhance object name path segments [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1539: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1539#discussion_r1855110077 ## src/ast/mod.rs: ## @@ -195,14 +199,36 @@ impl fmt::Display for Ident { #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] #[cfg_attr(fea

[I] CometExecIterator uses incorrect partition index [datafusion-comet]

2024-11-22 Thread via GitHub
viirya opened a new issue, #1113: URL: https://github.com/apache/datafusion-comet/issues/1113 ### Describe the bug Currently we get the partition index from TaskContext in CometExecIterator. It is actually incorrect. For example, in a query like ``` CartesianProductExec (25

[PR] fix: Use RDD partition index [datafusion-comet]

2024-11-22 Thread via GitHub
viirya opened a new pull request, #1112: URL: https://github.com/apache/datafusion-comet/pull/1112 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes te

Re: [PR] test: allow external_access_plan run on windows [datafusion]

2024-11-22 Thread via GitHub
zhuliquan commented on PR #13531: URL: https://github.com/apache/datafusion/pull/13531#issuecomment-2495389426 > we should really avoid `#[allow(dead_code)]` for stuff used only sometimes, because we won't know when this can be removed/ > > This is used in `external_access_plan` which

Re: [I] Add `SessionConfig` reference to `ScalarFunctionArgs` [datafusion]

2024-11-22 Thread via GitHub
Omega359 commented on issue #13519: URL: https://github.com/apache/datafusion/issues/13519#issuecomment-2494020943 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Feature/scalar func args session config [datafusion]

2024-11-22 Thread via GitHub
Omega359 opened a new pull request, #13527: URL: https://github.com/apache/datafusion/pull/13527 ## Which issue does this PR close? Closes #13519 ## Rationale for this change Allow udf's to access df config ## What changes are included in this PR?

Re: [I] Add `SessionConfig` reference to `ScalarFunctionArgs` [datafusion]

2024-11-22 Thread via GitHub
alamb commented on issue #13519: URL: https://github.com/apache/datafusion/issues/13519#issuecomment-2494024621 > It makes sense to me to move `SessionConfig` to common or common-runtime crate I agree -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench [datafusion]

2024-11-22 Thread via GitHub
alamb closed issue #12821: [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench URL: https://github.com/apache/datafusion/issues/12821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Support duplicate column aliases in queries [datafusion]

2024-11-22 Thread via GitHub
findepi commented on PR #13489: URL: https://github.com/apache/datafusion/pull/13489#issuecomment-2494043790 > We might need to introduce [column index](https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/column.rs#L71) to differentiate them. This

Re: [PR] Minor: clean up error entries [datafusion]

2024-11-22 Thread via GitHub
comphead merged PR #13521: URL: https://github.com/apache/datafusion/pull/13521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] minor: allow dead code for Scenario::UTF8 [datafusion]

2024-11-22 Thread via GitHub
zhuliquan opened a new pull request, #13531: URL: https://github.com/apache/datafusion/pull/13531 ## Which issue does this PR close? Closes #13530. ## Rationale for this change I always receive this warning when running `cargo clippy`, and I would like to

Re: [PR] recursive select calls are parsed with bad trailing_commas parameter [datafusion-sqlparser-rs]

2024-11-22 Thread via GitHub
iffyio commented on code in PR #1521: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1521#discussion_r1854227528 ## test.sql: ## @@ -0,0 +1,229 @@ +WITH EventData AS ( Review Comment: oh I think this and the other files were mistakenly checked in? -- Th

[I] warning: variant `UTF8` is never constructed [datafusion]

2024-11-22 Thread via GitHub
zhuliquan opened a new issue, #13530: URL: https://github.com/apache/datafusion/issues/13530 ### Describe the bug When I run cargo clippy on my Windows machine, I get the following warning: ```text warning: variant `UTF8` is never constructed --> datafusion\core\tests\pa

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
findepi commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2494046463 > # Implicit assumptions in LogicalPlans > Another thing that was mentioned was the challenge of writing custom optimizer rules was challenging because there were implicit ass

[PR] fix: [WIP] Stop dropping metrics [datafusion-comet]

2024-11-22 Thread via GitHub
andygrove opened a new pull request, #: URL: https://github.com/apache/datafusion-comet/pull/ ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1109 ## Rationale for this change We currently drop some nativ

Re: [I] Add `SessionConfig` reference to `ScalarFunctionArgs` [datafusion]

2024-11-22 Thread via GitHub
Omega359 commented on issue #13519: URL: https://github.com/apache/datafusion/issues/13519#issuecomment-2494038740 I mistyped above - I meant we couldn't use SessionContext, not SessionConfig. Sorry about the confusion. -- This is an automated message from the Apache Git Service. To respo

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-11-22 Thread via GitHub
findepi commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2494063561 thank you @alamb for the list, this is great! I know GlareDB's perspective is important and it feels like "lost customer deal" for an OSS project. As with all deals, it's equall

  1   2   >