andygrove opened a new pull request, #2325:
URL: https://github.com/apache/datafusion-comet/pull/2325
## Which issue does this PR close?
Closes https://github.com/apache/datafusion-comet/issues/2310
## Rationale for this change
Improve documentation
alamb commented on code in PR #17266:
URL: https://github.com/apache/datafusion/pull/17266#discussion_r2327043729
##
datafusion-cli/src/object_storage.rs:
##
@@ -563,6 +563,592 @@ pub(crate) async fn get_object_store(
Ok(store)
}
+pub mod instrumented {
+use core::fm
andygrove opened a new pull request, #2327:
URL: https://github.com/apache/datafusion-comet/pull/2327
## Which issue does this PR close?
N/A
## Rationale for this change
Preparing for 0.10.0 release
## What changes are included in this PR?
berkaysynnada commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2327272898
##
datafusion/physical-plan/src/windows/mod.rs:
##
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
input: &Arc,
window_exprs: &
comphead merged PR #17431:
URL: https://github.com/apache/datafusion/pull/17431
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead opened a new issue, #17453:
URL: https://github.com/apache/datafusion/issues/17453
### Is your feature request related to a problem or challenge?
DF and DuckDB returns different nullability flag for `map_keys` which makes
some challenges comparing arrays later like in
https:
comphead commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3262515544
> [@timsaucer](https://github.com/timsaucer) I'll make the branch-50 on
Sunday, so we still have time.
Thanks @xudong963 I'm also doing a quick fix for
https://github.co
comphead commented on issue #2321:
URL:
https://github.com/apache/datafusion-comet/issues/2321#issuecomment-3262550177
Will be included in DF 50 and depends on
https://github.com/apache/datafusion-comet/pull/2286
--
This is an automated message from the Apache Git Service.
To respond to
adriangb commented on PR #17454:
URL: https://github.com/apache/datafusion/pull/17454#issuecomment-3262607436
I've experienced this with I think Polars as well. I guess from the test
failures we need to update the schemas as well?
--
This is an automated message from the Apache Git Servic
milenkovicm opened a new pull request, #1311:
URL: https://github.com/apache/datafusion-ballista/pull/1311
# Which issue does this PR close?
Closes #.
# Rationale for this change
There is part of test which should have been enabled as we updated datafusion
# What
alamb commented on PR #17430:
URL: https://github.com/apache/datafusion/pull/17430#issuecomment-3261748751
Thanks @petern48 and @zhuqi-lucas ❤️
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
alamb commented on PR #17281:
URL: https://github.com/apache/datafusion/pull/17281#issuecomment-3261770095
@kosiew -- is there any way to break this PR into smaller PRs? It is very
challenging to review large PRs (as it requires a large amount of contiguous
time).
I think @adriangb
alamb commented on code in PR #17299:
URL: https://github.com/apache/datafusion/pull/17299#discussion_r2326823989
##
datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs:
##
@@ -62,7 +62,17 @@ pub async fn from_project_rel(
// to transform it into a
alamb commented on PR #17084:
URL: https://github.com/apache/datafusion/pull/17084#issuecomment-3261785857
While this is a cool idea, I don't think this needs to be in the DataFusion
repository itself. Specifically, this is an object store specific feature and
nothing specific to DataFusin
alamb closed issue #17389: Re-export apache-avro when feature is set, similar
to parquet
URL: https://github.com/apache/datafusion/issues/17389
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
alamb commented on PR #17357:
URL: https://github.com/apache/datafusion/pull/17357#issuecomment-3261789341
> > Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and
probably can be merged. My only potential concern is that we may mess up comet.
Let's see if we get any more
alamb merged PR #17364:
URL: https://github.com/apache/datafusion/pull/17364
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
berkaysynnada commented on issue #17401:
URL: https://github.com/apache/datafusion/issues/17401#issuecomment-3262089632
Perhaps we can find a way of detecting redundancy of the order propagation
over window ops (or just a simple rule) and skip those high complexity
calculations
--
This i
alamb commented on issue #17211:
URL: https://github.com/apache/datafusion/issues/17211#issuecomment-3261899782
> Ultimately I'd like partitioned datasets to operate with similar
performance to flat datasets, and have caching mechanisms available to both.
Based on the structure of the exist
alamb commented on code in PR #17232:
URL: https://github.com/apache/datafusion/pull/17232#discussion_r2326884782
##
datafusion/sqllogictest/test_files/listing_table_partitions.slt:
##
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more cont
comphead merged PR #17429:
URL: https://github.com/apache/datafusion/pull/17429
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
xudong963 commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3253652700
I will do it later, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
chenkovsky commented on issue #17422:
URL: https://github.com/apache/datafusion/issues/17422#issuecomment-3261561599
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
chenkovsky commented on issue #17425:
URL: https://github.com/apache/datafusion/issues/17425#issuecomment-3261561209
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
alamb commented on code in PR #17337:
URL: https://github.com/apache/datafusion/pull/17337#discussion_r2325676588
##
datafusion/optimizer/src/push_down_sort.rs:
##
@@ -0,0 +1,580 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license ag
jeff-99 commented on issue #1728:
URL:
https://github.com/apache/datafusion-sqlparser-rs/issues/1728#issuecomment-3261699551
I have the same issue. For example when parsing the following query:
```
WITH DIM_DATE_TIME_BASE([DateTime])
AS
(
SELEC
alamb commented on PR #17364:
URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261799314
I also tested the reproducer from
https://github.com/apache/datafusion/pull/17364 locally with this PR and it
works great:
```shell
DataFusion CLI v49.0.2
> CREATE EXT
xiedeyantu commented on PR #17364:
URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261826200
Thank you for your help and guidance! @alamb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
alamb commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3261791301
> [@timsaucer](https://github.com/timsaucer) I'll make the branch-50 on
Sunday, so we still have time.
Once we create a `branch-50` I'll start testing the upgrade with delta
alamb merged PR #17430:
URL: https://github.com/apache/datafusion/pull/17430
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #16302: Improved experience when remote object store URL
does not end in `/`
URL: https://github.com/apache/datafusion/issues/16302
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
andygrove opened a new pull request, #2326:
URL: https://github.com/apache/datafusion-comet/pull/2326
## Which issue does this PR close?
N/A
## Rationale for this change
This removes some manual steps during the release process.
## What changes are
davidlghellin commented on PR #17424:
URL: https://github.com/apache/datafusion/pull/17424#issuecomment-3262417201
In spark 3.5
When overflow in years
https://github.com/user-attachments/assets/5b4f6f6b-7bb7-403d-8530-4f4b23290559";
/>
--
This is an automated message from the Ap
shivbhatia10 commented on PR #17388:
URL: https://github.com/apache/datafusion/pull/17388#issuecomment-3261580451
Hi @alamb, I think I accidentally merged in the main branch which stopped
the CI from running, may need another approval from you, sorry about that!
--
This is an automated me
alamb commented on PR #17266:
URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3261932459
It does work when I ran it with the CLI flag:
```
> select * from nyc_taxi_rides limit 1;
+-++---+--+--+
alamb commented on PR #17364:
URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261933881
Thank you for sticking with it!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
thinkharderdev commented on code in PR #16983:
URL: https://github.com/apache/datafusion/pull/16983#discussion_r2327178463
##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -7390,6 +7392,41 @@ query error Error during planning: ORDER BY and WITHIN
GROUP clauses cannot
adriangb commented on issue #17451:
URL: https://github.com/apache/datafusion/issues/17451#issuecomment-3262036211
> Without some synchronization, the behavior is racy and it's not guaranteed
that that the dynamic filter is built prior to initiating the right side's
execution plan.
I
valkum commented on issue #17446:
URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3262124560
Yes. We have a one to many relationship of some data and it doesn't make
sense to store this normalized in a different file. My understanding of the
parquet format, or rather dre
comphead commented on PR #17454:
URL: https://github.com/apache/datafusion/pull/17454#issuecomment-3262946182
> I've experienced this with I think Polars as well. I guess from the test
failures we need to update the schemas as well?
Correct, the `return type` needed to be updated as w
davidlghellin commented on PR #17424:
URL: https://github.com/apache/datafusion/pull/17424#issuecomment-3262839840
in this commit
https://github.com/apache/datafusion/pull/17424/commits/f812157f265152b6c4de925e61ead14b7ac44259
test sqllogictests return blank line always with empty params an
comphead merged PR #17454:
URL: https://github.com/apache/datafusion/pull/17454
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
dependabot[bot] opened a new pull request, #1228:
URL: https://github.com/apache/datafusion-python/pull/1228
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.18.0 to 1.18.1.
Release notes
Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases.
v1.18.1
notfilippo commented on issue #17405:
URL: https://github.com/apache/datafusion/issues/17405#issuecomment-3253292531
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
xudong963 commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3253915218
@timsaucer I'll make the branch-50 on Sunday, so we still have time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
findepi opened a new pull request, #17438:
URL: https://github.com/apache/datafusion/pull/17438
Before the changes, `PartialOrd` could return `Some(Equal)` for two values
that are not equal in `PartialEq` sense. This is violation of `PartialOrd`
contract.
The fix is to consult eq ins
dependabot[bot] opened a new pull request, #17410:
URL: https://github.com/apache/datafusion/pull/17410
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.4.0
to 5.0.0.
Release notes
Sourced from https://github.com/actions/setup-node/releases";>actions/setup-n
Jefffrey commented on code in PR #17399:
URL: https://github.com/apache/datafusion/pull/17399#discussion_r2324182515
##
datafusion/spark/src/function/url/url_decode.rs:
##
@@ -0,0 +1,195 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor li
adriangb merged PR #17415:
URL: https://github.com/apache/datafusion/pull/17415
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
jonathanc-n commented on code in PR #17431:
URL: https://github.com/apache/datafusion/pull/17431#discussion_r2324191348
##
datafusion/physical-plan/src/joins/sort_merge_join/exec.rs:
##
@@ -1923,6 +1974,100 @@ mod tests {
Ok(())
}
+#[tokio::test]
+async f
andygrove opened a new pull request, #2308:
URL: https://github.com/apache/datafusion-comet/pull/2308
## Which issue does this PR close?
Closes https://github.com/apache/datafusion-comet/issues/2305
## Rationale for this change
## What changes are included
2010YOUY01 opened a new issue, #17458:
URL: https://github.com/apache/datafusion/issues/17458
### Describe the bug
`datafusion-cli` tests are failing on the latest main (see the below commit
hash)
```sh
yongting@Yongtings-MacBook-Pro-2 ~/C/datafusion (main=) [SIGINT]> gi
dependabot[bot] opened a new pull request, #17435:
URL: https://github.com/apache/datafusion/pull/17435
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.46 to 4.5.47.
Release notes
Sourced from https://github.com/clap-rs/clap/releases";>clap's releases.
v4.5.47
[4.5.
nuno-faria commented on code in PR #17382:
URL: https://github.com/apache/datafusion/pull/17382#discussion_r2321240569
##
datafusion/sql/src/unparser/plan.rs:
##
@@ -696,13 +696,6 @@ impl Unparser<'_> {
join_filters.as_ref(),
)?;
-
alamb commented on code in PR #17438:
URL: https://github.com/apache/datafusion/pull/17438#discussion_r2325757148
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -2114,7 +2116,9 @@ pub struct Values {
// Manual implementation needed because of `schema` field. Comparison excl
valkum opened a new issue, #17420:
URL: https://github.com/apache/datafusion/issues/17420
### Describe the bug
When reading a parquet hive that was stored with
`datafusion.execution.keep_partition_by_columns = TRUE`, the created table has
two columns with the same name, raising a `Sc
comphead commented on code in PR #2057:
URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2325377904
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -711,8 +715,53 @@ object QueryPlanSerde extends Logging with CometExprShim {
wForget commented on PR #2301:
URL:
https://github.com/apache/datafusion-comet/pull/2301#issuecomment-3257001596
> Thanks @wForget would you mind attach how reasons look before and after
Thanks, I have edited description to add more test information.
--
This is an automated message
jonathanc-n commented on issue #16820:
URL: https://github.com/apache/datafusion/issues/16820#issuecomment-3258707807
https://github.com/apache/datafusion/blob/50e073c425afd0eda309b80d004ee0aa619cbafe/benchmarks/src/nlj.rs#L64
In here we can add queries for NLJ to test performance. We
berkaysynnada commented on code in PR #17337:
URL: https://github.com/apache/datafusion/pull/17337#discussion_r2327167687
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -2525,6 +2525,8 @@ pub struct TableScan {
pub filters: Vec,
/// Optional number of rows to read
lewiszlw commented on code in PR #17290:
URL: https://github.com/apache/datafusion/pull/17290#discussion_r2323956044
##
parquet-testing:
##
Review Comment:
Thanks for reverting submodule update.
--
This is an automated message from the Apache Git Service.
To respond to
dependabot[bot] opened a new pull request, #1229:
URL: https://github.com/apache/datafusion-python/pull/1229
Bumps [log](https://github.com/rust-lang/log) from 0.4.27 to 0.4.28.
Release notes
Sourced from https://github.com/rust-lang/log/releases";>log's releases.
0.4.28
W
alamb commented on PR #17307:
URL: https://github.com/apache/datafusion/pull/17307#issuecomment-3253688640
Yes, for sure -- sorry I was away. Please feel free to ping other committers
to merge it too
--
This is an automated message from the Apache Git Service.
To respond to the message, p
alamb commented on issue #17348:
URL: https://github.com/apache/datafusion/issues/17348#issuecomment-3259221010
I believe this is very similar to what @karlovnv is proposing in
- https://github.com/apache/datafusion/issues/10433
--
This is an automated message from the Apache Git Servic
rkrishn7 commented on code in PR #17452:
URL: https://github.com/apache/datafusion/pull/17452#discussion_r2326578311
##
datafusion/core/tests/physical_optimizer/filter_pushdown/util.rs:
##
@@ -61,6 +62,12 @@ impl FileOpener for TestOpener {
_file_meta: FileMeta,
mbutrovich merged PR #2318:
URL: https://github.com/apache/datafusion-comet/pull/2318
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
findepi commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2321368510
##
datafusion/physical-plan/src/windows/mod.rs:
##
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
input: &Arc,
window_exprs: &[Arc],
adriangb commented on issue #8078:
URL: https://github.com/apache/datafusion/issues/8078#issuecomment-3259100142
Reading through the issues and posting my thoughts as I go. I am
particularly interested in improving the `Statistics` that gets attached to
files and partitions:
https:/
alamb merged PR #17388:
URL: https://github.com/apache/datafusion/pull/17388
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
adriangb commented on code in PR #17452:
URL: https://github.com/apache/datafusion/pull/17452#discussion_r2328467149
##
datafusion/core/tests/physical_optimizer/filter_pushdown/util.rs:
##
@@ -61,6 +62,12 @@ impl FileOpener for TestOpener {
_file_meta: FileMeta,
coderfender commented on PR #2136:
URL:
https://github.com/apache/datafusion-comet/pull/2136#issuecomment-3263351268
Resolved issues with failing tests caused by incorrect diff file generation
.
--
This is an automated message from the Apache Git Service.
To respond to the message, ple
comphead opened a new pull request, #17459:
URL: https://github.com/apache/datafusion/pull/17459
## Which issue does this PR close?
- Closes #.
## Rationale for this change
## What changes are included in this PR?
## Are these changes tested
comphead opened a new issue, #17460:
URL: https://github.com/apache/datafusion/issues/17460
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-m
comphead commented on code in PR #17459:
URL: https://github.com/apache/datafusion/pull/17459#discussion_r2328511102
##
datafusion/sql/src/statement.rs:
##
@@ -2024,9 +2024,9 @@ impl SqlToRel<'_, S> {
let mut value_indices = vec![None; table_schema.fields().len()];
rishvin opened a new pull request, #2328:
URL: https://github.com/apache/datafusion-comet/pull/2328
## Which issue does this PR close?
Addresses Part of #1941
## Rationale for this change
Introduces `map_from_list` which converts a `ListArray` to
`MapArray`.
LucaCappelletti94 opened a new issue, #2020:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2020
The enum
[`Statement`](https://docs.rs/sqlparser/latest/sqlparser/ast/enum.Statement.html)
has several variants of the type `Variant(VariantStruct)`, such as `Set(Set)`
or
`Crea
nuno-faria commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3262612288
> I just gave this a read through and think it's looking great! I'd like to
add a benchmark showing join performance numbers (@nuno-faria I think you had
something already, would
adriangb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3263259303
Maybe https://github.com/apache/datafusion/pull/17452 will help with
determinism?
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
rkrishn7 commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3263346558
> Maybe
[apache/datafusion#17452](https://github.com/apache/datafusion/pull/17452) will
help with determinism?
I ran the same test as @nuno-faria against my branch and consi
comphead commented on PR #2181:
URL:
https://github.com/apache/datafusion-comet/pull/2181#issuecomment-3262550613
Depends on https://github.com/apache/datafusion-comet/pull/2286
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
adriangb commented on issue #17451:
URL: https://github.com/apache/datafusion/issues/17451#issuecomment-3263184068
I took a look at the PR it looks really nice. I think it's what we want. I
just have to double check it with some more time. Nice work!
--
This is an automated message from t
SparkApplicationMaster commented on PR #17456:
URL: https://github.com/apache/datafusion/pull/17456#issuecomment-3263235043
Some caveats:
1) Tried to implement type signature like this:
```rust
Signature::arrays(2, Some(ListCoercion::FixedSizedListToList),
Volatility::Immutable)
coderfender commented on PR #17357:
URL: https://github.com/apache/datafusion/pull/17357#issuecomment-3262887016
@alamb , @mbutrovich I made changes to comet to fallback to CASE statement
to replicate `lazy` evaluation mode with coalesce (and then plan to work on
this PR). Glad to see tha
alamb commented on issue #17446:
URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3261747645
A `GroupsAccumulator ` will be non trivial for ArrayAgg, I recommend you
start with a simple type like Int64 first, and then we can make it generic for
all primitives and then oth
84 matches
Mail list logo