viirya commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3082735431
So the basic idea, as I understand it, is that instead of adapting the data
batch using `SchemaAdapter` against the schema, the new approach involves
rewriting or transforming th
zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082712617
Updated parquet result from my local using the 1e8 dataset, it even faster:
```rust
./bench.sh run h2o_medium_join_parquet
***
DataFusi
milenkovicm commented on PR #1275:
URL:
https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3082666517
Have no access to computer at the moment. Can you please push change, it
should trigger new job
Two questions
- What's the reason to remove cancel job? It w
zhuqi-lucas commented on PR #16804:
URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082646775
> @zhuqi-lucas Just for the case falsa generates too much noise in stdout
(and runner's output), I can add a command-line argument to suppress it.
Something like `--silent`.
SemyonSinchenko commented on PR #16804:
URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082635479
@zhuqi-lucas Just for the case falsa generates too much noise in stdout (and
runner's output), I can add a command-line argument to suppress it. Something
like `--silent`.
zhuqi-lucas commented on PR #16804:
URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082603389
Updated, it works now, the falsa has merged the fix and released:
https://github.com/mrpowers-io/falsa/pull/28
```rust
./bench.sh data h2o_small_join_parquet
yoavcloud opened a new pull request, #1951:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1951
Extended the already existing generic support for DROP statements to USER.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082400648
> > > > [@UBarney](https://github.com/UBarney) - here are the 1e7 join
results on my M3 Macbook with 16GB of RAM:
> > >
> > >
> > > [@MrPowers](https://github.com/
UBarney commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082362965
> > > [@UBarney](https://github.com/UBarney) - here are the 1e7 join results
on my M3 Macbook with 16GB of RAM:
> >
> >
> > [@MrPowers](https://github.com/MrPowers) I
Huy1Ng commented on PR #1275:
URL:
https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3082356061
@milenkovicm can you rerun the CI? I pushed 2 commits too close together so
some jobs failed because they didn't get an available machine. The CI run on my
fork succeeded:
ht
zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082347855
@mrpowers-wb
I submit the PR for h2o benchmark to support parquet format in datafusion,
but it blocks by falsa join dataset generate, details:
https://github.com/a
zhuqi-lucas commented on PR #16804:
URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082341745
Create the jira for falsa side, it fails with generate parquet data for join
set, but it works well with group by.
https://github.com/mrpowers-io/falsa/issues/27
--
This
zhuqi-lucas commented on PR #16804:
URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082332726
Updated: error for parquet join data generate, it works for group by:
```rust
./bench.sh data h2o_medium_join_parquet
***
DataFusion Benchma
zhuqi-lucas opened a new pull request, #16804:
URL: https://github.com/apache/datafusion/pull/16804
## Which issue does this PR close?
Currently, we only support for CSV format for h2o benchmark, but from the
compare with other database result, it is using parquet, so this ticket try to
kosiew merged PR #16734:
URL: https://github.com/apache/datafusion/pull/16734
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafus
kosiew closed issue #16717: Better parallelize large input batches (speed up
dataframe access)
URL: https://github.com/apache/datafusion/issues/16717
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
adriangb commented on code in PR #16803:
URL: https://github.com/apache/datafusion/pull/16803#discussion_r2212054428
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -433,7 +433,7 @@ impl ListingTableConfig {
/// `SchemaAdapterFactory` is set, in which case only th
adriangb commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3082294797
@parthchandra @mbutrovich please take a look at
https://github.com/apache/datafusion/pull/16803.
As per the comments in the example it looks like Comet already has a custom
adriangb opened a new pull request, #16803:
URL: https://github.com/apache/datafusion/pull/16803
https://github.com/apache/datafusion/issues/16800#issuecomment-3080175396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
2010YOUY01 commented on PR #16798:
URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3082274998
what about using explicit casting in applications? For example:
```sh
> select not(arrow_cast(1, 'Boolean'));
+--+
| NOT arrow_ca
zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082269096
> > [@UBarney](https://github.com/UBarney) - here are the 1e7 join results
on my M3 Macbook with 16GB of RAM:
>
> [@MrPowers](https://github.com/MrPowers) I am using t
comphead commented on issue #2004:
URL:
https://github.com/apache/datafusion-comet/issues/2004#issuecomment-3082214873
DF has similar work for q1-q4 H2O benchmarks
https://github.com/apache/datafusion/issues/16710
--
This is an automated message from the Apache Git Service.
To respond to
comphead commented on code in PR #2031:
URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2212009586
##
native/hdfs/src/object_store/hdfs.rs:
##
@@ -88,19 +88,33 @@ impl HadoopFileSystem {
fn read_range(range: &Range, file: &HdfsFile) -> Result {
github-actions[bot] closed pull request #15423: Introduce selection vector
repartitioning
URL: https://github.com/apache/datafusion/pull/15423
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
adriangb merged PR #16791:
URL: https://github.com/apache/datafusion/pull/16791
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
UBarney commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082082344
> [@UBarney](https://github.com/UBarney) - here are the 1e7 join results on
my M3 Macbook with 16GB of RAM:
@MrPowers I am using the **1e8** dataset.
```
target
parthchandra commented on PR #1987:
URL:
https://github.com/apache/datafusion-comet/pull/1987#issuecomment-3082076029
@hsiang-c created https://github.com/apache/datafusion-comet/issues/2033 to
track this issue
--
This is an automated message from the Apache Git Service.
To respond to th
parthchandra opened a new issue, #2033:
URL: https://github.com/apache/datafusion-comet/issues/2033
### Describe the bug
Reproducing issue and steps to reproduce from this
https://github.com/apache/datafusion-comet/pull/1987#issuecomment-3075575929
Many Iceberg Spark SQL Tes
parthchandra commented on code in PR #1996:
URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2211898302
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -677,7 +677,14 @@ object QueryPlanSerde extends Logging with CometExprShim {
parthchandra commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3081929167
> Having looked at your implementation I think it may not be that bad! It
seems like most of what your SchemaAdapter is doing is customizing casting
rules, right?
Al
parthchandra commented on code in PR #2032:
URL: https://github.com/apache/datafusion-comet/pull/2032#discussion_r2211880244
##
common/src/main/java/org/apache/comet/parquet/BatchReader.java:
##
@@ -183,9 +183,7 @@ public BatchReader(
this.taskContext = TaskContext$.MODULE$
adriangb commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3081902444
Having looked at your implementation I think it may not be that bad! It
seems like most of what your SchemaAdapter is doing is customizing casting
rules, right?
--
This is a
parthchandra commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3081890593
I feel it may be a fair amount of work in Comet to move from `SchemaAdapter`
to `PhysicalExprAdapter` but from the pseudocode example it appears tractable.
I think we'll be
parthchandra commented on PR #2031:
URL:
https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3081817670
@Kontinuation @andygrove @comphead, updated based on review comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
parthchandra commented on code in PR #2018:
URL: https://github.com/apache/datafusion-comet/pull/2018#discussion_r2211835763
##
spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:
##
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+
akoshchiy commented on code in PR #16790:
URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -128,12 +128,31 @@ physical_plan
06)--ProjectionExec: expr=[column1@0 as column1, colu
adriangb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211616561
##
datafusion/core/tests/parquet/schema_adapter.rs:
##
@@ -0,0 +1,92 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
alamb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211267704
##
docs/source/library-user-guide/upgrading.md:
##
@@ -120,6 +120,17 @@ SET datafusion.execution.spill_compression = 'zstd';
For more details about this configura
codecov-commenter commented on PR #2032:
URL:
https://github.com/apache/datafusion-comet/pull/2032#issuecomment-3080720676
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2032?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
yoavcloud opened a new pull request, #1950:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1950
Added support for the `CREATE USER` statement in Snowflake. Enhanced the
KeyValueOptions struct with:
1. A custom delimiter
2. Optional parentheses
3. Optional keywords that
akupchinskiy commented on code in PR #2010:
URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2211570646
##
native/spark-expr/src/nondetermenistic_funcs/randn.rs:
##
@@ -0,0 +1,265 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more
akupchinskiy commented on code in PR #2010:
URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2211570646
##
native/spark-expr/src/nondetermenistic_funcs/randn.rs:
##
@@ -0,0 +1,265 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more
parthchandra closed issue #2029: Expected: decimal(7,2), Found: DOUBLE when
running TPC-DS benchmarks on Spark 3.5
URL: https://github.com/apache/datafusion-comet/issues/2029
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
parthchandra commented on issue #2029:
URL:
https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3080492782
> One surprising thing to note is that Gluten and Blaze were working fine
with the data containing the flag
That is, in fact, quite surprising. Double is not a good
adriangb commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3080175396
Here's some more examples:
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/json_shredding.rs,
https://github.com/apache/datafusion/blob/main/datafu
adriangb commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3080077103
Here's an example from Comet:
https://github.com/vaibhawvipul/datafusion-comet/blob/main/native/core/src/parquet/schema_adapter.rs.
As you can see it's _a lot_ of code with [a
MrPowers commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3080025409
@UBarney - here are the 1e7 join results on my M3 Macbook with 16GB of RAM:

findepi commented on PR #16625:
URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3080003484
@ozankabak @alamb can you please help me understand where you would want to
go with this?
or maybe DF doesn't need to support ordered array_aggs (more than one in a
query)?
viirya commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3079995268
Reach here from @alamb's post on dev. Although I saw what is proposed to do
but it is unclear to me how it will work or what steps there will be. At least
could you describe the
aharpervc commented on PR #1949:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1949#issuecomment-3079970637
@alamb fyi, [as previously
discussed](https://github.com/apache/datafusion-sqlparser-rs/pull/1937#issuecomment-3070806780)
--
This is an automated message from the Ap
ShreyeshArangath commented on issue #2029:
URL:
https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3079962752
Yeah, I was able to fix this issue by fixing the data-generation. We are
using https://github.com/maropu/spark-tpcds-datagen for our datagen, removing
the `--use-d
aharpervc opened a new pull request, #1949:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949
This PR is a followup
([ref](https://github.com/apache/datafusion-sqlparser-rs/pull/1937#issuecomment-3070806780))
to recent work on parsing without requiring semicolon statement del
adriangb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211280296
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -468,10 +468,50 @@ impl FileSource for ParquetSource {
let projection = base_config
.f
adriangb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211279282
##
datafusion/core/tests/parquet/schema_adapter.rs:
##
@@ -0,0 +1,92 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
alamb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211266501
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -468,10 +468,50 @@ impl FileSource for ParquetSource {
let projection = base_config
.file
alamb commented on issue #16235:
URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3079804143
I am hoping to start with some of the various release tasks tomorrow (like
ensuring hte upgrade guide is in a good place) but I have been very busy with
other things recently. Hop
Loaki07 commented on issue #16795:
URL: https://github.com/apache/datafusion/issues/16795#issuecomment-3079789813
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
adriangb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211108359
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1095,4 +1124,167 @@ mod test {
assert_eq!(num_batches, 0);
assert_eq!(num_rows, 0);
huaxingao opened a new pull request, #2032:
URL: https://github.com/apache/datafusion-comet/pull/2032
## Which issue does this PR close?
Closes #.
## Rationale for this change
## What changes are included in this PR?
## How are these changes
GitHub user zheniasigayev added a comment to the discussion: Best practices for
memory-efficient deduplication of pre-sorted Parquet files
The above results were performed with the following setup:
* `datafusion-cli -m 8G -d 50G --top-memory-consumers 25`
* The default `datafusion.execution.par
GitHub user zheniasigayev added a comment to the discussion: Best practices for
memory-efficient deduplication of pre-sorted Parquet files
Addressing Question 2)
It's not possible to remove the `first_value()` aggregate from the above query
since `col_7` and `col_8` won't appear in the `GROUP
comphead opened a new pull request, #16802:
URL: https://github.com/apache/datafusion/pull/16802
## Which issue does this PR close?
- Closes https://github.com/apache/datafusion/issues/16187.
## Rationale for this change
Add tests proving the issue is fixed after
GitHub user zheniasigayev added a comment to the discussion: Best practices for
memory-efficient deduplication of pre-sorted Parquet files
Addressing Question 1.
The query plan for the original query:
```sql
CREATE EXTERNAL TABLE example (
col_1 VARCHAR(50) NOT NULL,
col_2 BIGINT NOT
parthchandra commented on code in PR #2031:
URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210968794
##
native/hdfs/src/object_store/hdfs.rs:
##
@@ -88,19 +88,18 @@ impl HadoopFileSystem {
fn read_range(range: &Range, file: &HdfsFile) -> Result {
adriangb commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3079371233
Thank you for chiming in!
> We'll be able to start making the migration with the DF 49.0 release?
Yes that's the plan. I'm trying to figure out the way to make the
akoshchiy commented on code in PR #16790:
URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -128,12 +128,31 @@ physical_plan
06)--ProjectionExec: expr=[column1@0 as column1, colu
akoshchiy commented on code in PR #16790:
URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -128,12 +128,31 @@ physical_plan
06)--ProjectionExec: expr=[column1@0 as column1, colu
zhuqi-lucas commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079295059
> > My guess is that some of the new slowdown / less predictability is due
to many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing
frequently used structure
mbutrovich commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3079257969
Comet makes increasing use of `SchemaAdapter`, but nothing you describe here
sounds like a dealbreaker for Comet at first glance. I think we'd be able to
make the necessary c
mbutrovich commented on code in PR #2010:
URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2210833524
##
native/spark-expr/src/nondetermenistic_funcs/randn.rs:
##
@@ -0,0 +1,265 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more co
adriangb opened a new issue, #16801:
URL: https://github.com/apache/datafusion/issues/16801
### Describe the bug
There's something wrong with
[datafusion/core/tests/integration_tests/schema_adapter_integration_tests.rs](https://github.com/apache/datafusion/blob/main/datafusion/core/te
adriangb commented on issue #16801:
URL: https://github.com/apache/datafusion/issues/16801#issuecomment-3079240904
@kosiew could you take a look at this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
adriangb commented on code in PR #16790:
URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210812065
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -128,12 +128,31 @@ physical_plan
06)--ProjectionExec: expr=[column1@0 as column1, colum
Kontinuation commented on PR #2031:
URL:
https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3079205816
> Sorry @Kontinuation if I check your references
https://github.com/datafusion-contrib/fs-hdfs/blob/8c03c5ef0942b75abc79ed673931355fa9552131/c_src/libhdfs/hdfs.c#L1564C15-L1
mbutrovich commented on code in PR #2010:
URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2210768385
##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -2765,6 +2765,26 @@ class CometExpressionSuite extends CometTestBase with
Adapti
zhuqi-lucas commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079128237
> My guess is that some of the new slowdown / less predictability is due to
many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing
frequently used structures
Kontinuation commented on code in PR #2031:
URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210729874
##
native/hdfs/src/object_store/hdfs.rs:
##
@@ -141,13 +140,15 @@ impl ObjectStore for HadoopFileSystem {
let file_status = file.get_file_sta
Kontinuation commented on code in PR #2031:
URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210716251
##
native/hdfs/src/object_store/hdfs.rs:
##
@@ -88,19 +88,18 @@ impl HadoopFileSystem {
fn read_range(range: &Range, file: &HdfsFile) -> Result {
mbutrovich commented on PR #2010:
URL:
https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3079038284
> In those scenarios we do have reproducibility and I believe a native
implementation should also have this property.
Thank you for the great explanation! This makes se
alamb commented on PR #16755:
URL: https://github.com/apache/datafusion/pull/16755#issuecomment-3079019727
What is the purpose of this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
alamb commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079000603
My guess is that some of the new slowdown / less predictability is due to
many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing
frequently used structures (like Co
alamb commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3078987206
> Marked the reduce Expr size task here:
>
> [#16771](https://github.com/apache/datafusion/pull/16771)
Added
--
This is an automated message from the Apache Git Ser
comphead commented on code in PR #2031:
URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210658711
##
native/hdfs/src/object_store/hdfs.rs:
##
@@ -88,19 +88,18 @@ impl HadoopFileSystem {
fn read_range(range: &Range, file: &HdfsFile) -> Result {
comphead closed issue #2005: q9
URL: https://github.com/apache/datafusion-comet/issues/2005
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsub
adriangb commented on PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#issuecomment-3078933391
I opened https://github.com/apache/datafusion/issues/16800 to track the big
picture
--
This is an automated message from the Apache Git Service.
To respond to the message, please
adriangb opened a new issue, #16800:
URL: https://github.com/apache/datafusion/issues/16800
As discussed in https://github.com/apache/datafusion/pull/16791 the long
term plan in my mind (and that I would like to discuss with the community) is
to replace `SchemaAdapter` with `PhysicalExprAda
UBarney commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078931663
> Thanks [@nuno-faria](https://github.com/nuno-faria) that's a great insight
(for TPC-H / very nested joins we probably should implement a smarter join
order algorithm).
>
Loaki07 commented on issue #16795:
URL: https://github.com/apache/datafusion/issues/16795#issuecomment-3078924010
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
iffyio commented on code in PR #1927:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2210549842
##
src/parser/mod.rs:
##
@@ -7724,6 +7737,27 @@ impl<'a> Parser<'a> {
return option;
}
+self.with_state(
+Co
adriangb commented on code in PR #16791:
URL: https://github.com/apache/datafusion/pull/16791#discussion_r2210616673
##
datafusion/datasource-parquet/src/row_filter.rs:
##
@@ -140,6 +143,8 @@ impl ArrowPredicate for DatafusionArrowPredicate {
}
fn evaluate(&mut self,
zhuqi-lucas commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078771877
> After opening the DF50.0.0 release issue, you can add it to the list
Thank you @xudong963 , added it in
https://github.com/apache/datafusion/issues/16799#issuecomment-307
zhuqi-lucas commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3078770965
Marked the reduce Expr size task here:
https://github.com/apache/datafusion/pull/16771
--
This is an automated message from the Apache Git Service.
To respond to the
zhuqi-lucas commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078761333
> 🤖: Benchmark completed
>
> Details
>
> ```
> group main
reduce_expr_size
> -
alamb commented on PR #16771:
URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078726061
🤖: Benchmark completed
Details
```
group main
reduce_expr_size
-
alamb commented on PR #16711:
URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3078620431
I looked into this failure running clickbench:
```
│ QQuery 27│ 2328.28 ms │ FAIL │ incomparable │
```
I ran the
[`q27.sql`](https://github.com/apache
GitHub user alamb added a comment to the discussion: Best practices for
memory-efficient deduplication of pre-sorted Parquet files
👋
Give your description, I am surprised that this query is using a
HashAggregateStream -- the hash aggregate needs to buffer the entire dataset in
RAM / spill i
GitHub user leoDYL edited a discussion: DISCUSSION: DataFusion Meetup in New
York, NY, USA - Sep 15, 2025
We are organizing an NYC meetup to celebrate the upcoming release 50. Currently
planning on Sept 15th, 2025. We will organize it in the same location as #11213
Registration link: https://
GitHub user NGA-TRAN edited a discussion: DISCUSSION: DataFusion Meetup in
Boston, USA - Nov 12, 2025
With the upcoming New York meetup on the horizon, the DataDog Boston team is
excited to plan a local DataFusion-themed gathering this fall!
**Date:** Wednesday, November 12
📍 Location: Data
alamb opened a new issue, #16799:
URL: https://github.com/apache/datafusion/issues/16799
### Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Previous release will be https://crates.io/crates/d
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3078462264
> > select t1.value from range(100) t1 join range(819200) t2 on t1.value +
t2.value < t1.value * t2.value;
>
> I'm happy to include this benchmark in the bench suite this week,
1 - 100 of 138 matches
Mail list logo