dependabot[bot] opened a new pull request, #16440:
URL: https://github.com/apache/datafusion/pull/16440
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.173 to 0.2.174.
Release notes
Sourced from https://github.com/rust-lang/libc/releases";>libc's releases.
0.2.174
nirnayroy commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2154895463
##
datafusion/functions/src/regex/regexpcount.rs:
##
@@ -550,7 +550,7 @@ where
}
}
-fn compile_and_cache_regex<'strings, 'cache>(
+pub fn compile_and_cac
dependabot[bot] opened a new pull request, #16441:
URL: https://github.com/apache/datafusion/pull/16441
Bumps [bzip2](https://github.com/trifectatechfoundation/bzip2-rs) from 0.5.2
to 0.6.0.
Release notes
Sourced from https://github.com/trifectatechfoundation/bzip2-rs/releases";>bz
xudong963 merged PR #16441:
URL: https://github.com/apache/datafusion/pull/16441
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
epgif commented on code in PR #16401:
URL: https://github.com/apache/datafusion/pull/16401#discussion_r2154521172
##
datafusion/catalog/src/schema.rs:
##
@@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send {
name: &str,
) -> Result>, DataFusionError>;
Dandandan commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2984064963
2. Easier to serialize across the wire
Yeah that part is of course true (especially larger tables you probably want
to avoid sending over the network).
the `1. More
xudong963 commented on code in PR #16424:
URL: https://github.com/apache/datafusion/pull/16424#discussion_r2154535222
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -524,6 +512,91 @@ fn should_enable_page_index(
.unwrap_or(false)
}
+/// Prune based on part
YanivKunda commented on code in PR #1830:
URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2154538985
##
dev/.DS_Store:
##
Review Comment:
macOS local file
##
dev/diffs/4.0.0-diff.patch:
##
Review Comment:
looks like a temporary
xudong963 commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2984129180
From my past experience, bloom filter mostly generates a negative impact.
And for most cases, min-max works fine.
--
This is an automated message from the Apache Git Service
comphead merged PR #16423:
URL: https://github.com/apache/datafusion/pull/16423
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead merged PR #16440:
URL: https://github.com/apache/datafusion/pull/16440
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
adriangb commented on code in PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#discussion_r2155006506
##
datafusion/physical-plan/src/joins/hash_join.rs:
##
@@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec {
try_embed_projection(projection, s
andygrove commented on code in PR #1898:
URL: https://github.com/apache/datafusion-comet/pull/1898#discussion_r2155022740
##
native/core/src/execution/planner.rs:
##
@@ -555,7 +555,21 @@ impl PhysicalPlanner {
fail_on_error,
)))
adriangb commented on code in PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#discussion_r2155025406
##
datafusion/physical-plan/src/joins/hash_join.rs:
##
@@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec {
try_embed_projection(projection, s
eliaperantoni opened a new pull request, #1891:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1891
A Snowflake query like:
```sql
CREATE VIEW X (COL WITH TAG (pii='email') COMMENT 'foobar') AS SELECT * FROM
Y
```
Would've previously failed because it contain
eliaperantoni opened a new pull request, #1892:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1892
In e.g. Snowflake, tags can have qualifying elements in their name:
```sql
CREATE VIEW foo AS (SELECT 1 AS COL WITH TAG foo.bar.baz)
```
But the parser currentl
eliaperantoni opened a new pull request, #1893:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1893
At the moment, the `Span` of an `Ident` is used influences its ordering, but
this doesn't seem consistent with the implementations of `PartialEq`, `Hash`,
and the general guideli
eliaperantoni commented on PR #1892:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1892#issuecomment-2983066319
It would be nice if #1891 was merged first so that I can write a Snowflake
test like the example in this PR's description, since `WITH TAG ...` is not
currently sup
samuelcolvin opened a new issue, #16438:
URL: https://github.com/apache/datafusion/issues/16438
When searching "datafusion create_udf", the first result is
`https://datafusion.apache.org/library-user-guide/adding-udfs.html` which is
showing a 404 page.
--
This is an automated message fro
samuelcolvin commented on issue #16438:
URL: https://github.com/apache/datafusion/issues/16438#issuecomment-2983122187
If docs pages are moved, a redirect should be used to both help users and
maintain SEO.
--
This is an automated message from the Apache Git Service.
To respond to the mes
xudong963 commented on issue #15885:
URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2983943967
> DuckDB's implementation: transform dependent join, flatten dependent join,
eliminate delimjoin
I believe we're almost same, expect the implementation of datebend doesn
mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983928323
> We already have the full table in memory, so we can not really save
anything by compressing it into a bloom filter.
Agreed: if we're not concerned with larger-than-me
samuelcolvin commented on PR #14837:
URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2983962469
https://github.com/apache/datafusion/blob/630aa7b0c7b44ea8e77f9e0d685bf79f2a3cd3bd/datafusion/core/src/execution/context/mod.rs#L1766
Needs an option for async UDFs I gues
mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983990746
> 2. Easier to serialize across the wire
This is actually something I've started looking at in the last day and got
stuck pretty quickly trying to serialize the HashBro
Dandandan commented on code in PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#discussion_r2154953337
##
datafusion/physical-plan/src/joins/hash_join.rs:
##
@@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec {
try_embed_projection(projection,
comphead commented on code in PR #16401:
URL: https://github.com/apache/datafusion/pull/16401#discussion_r2154936751
##
datafusion/catalog/src/schema.rs:
##
@@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send {
name: &str,
) -> Result>, DataFusionError
Dandandan closed pull request #16445: Add dynamic filter (bounds) pushdown to
HashJoinExec
URL: https://github.com/apache/datafusion/pull/16445
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
Dandandan commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2984776797
I tink we should also consider a heuristic for not evaluating the filter if
it's not useful.
Also I think doing only the lookup is preferable above also computing /
checking
Dandandan commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2984782413
Sorry, misclicked a button.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
adriangb commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2984788592
> I think doing only the lookup is preferable above also computing /
checking the bounds, I think the latter might create more overhead
My thought was that for some cases the
adriangb commented on code in PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#discussion_r2154970369
##
datafusion/physical-plan/src/joins/hash_join.rs:
##
@@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec {
try_embed_projection(projection, s
2010YOUY01 commented on code in PR #16268:
URL: https://github.com/apache/datafusion/pull/16268#discussion_r2154095345
##
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##
@@ -1324,6 +1326,7 @@ impl Stream for SortMergeJoinStream {
impl SortMergeJoinStream {
#[allo
Dandandan commented on code in PR #16433:
URL: https://github.com/apache/datafusion/pull/16433#discussion_r2154078467
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -319,13 +341,88 @@ impl TopK {
/// (a > 2 OR (a = 2 AND b < 3))
/// ```
fn update_filter(&mut s
alamb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983585902
FYI @mbutrovich -- I believe you were working on something like this
related to Comet -- maybe it is worth a look / review here to make sure the
design works with comet too if po
alamb commented on PR #15958:
URL: https://github.com/apache/datafusion/pull/15958#issuecomment-2983591412
> > @irenjj - I wonder if you would be willing to help pick this PR back up
now that we have merged a PR with a bunch of tests from @shehabgamin here:
> >
> > * [chore: generate
rluvaton opened a new pull request, #1899:
URL: https://github.com/apache/datafusion-comet/pull/1899
## Which issue does this PR close?
N/A
## Rationale for this change
Some registration exists in the `prepare_datafusion_session_context` and
some in the `PhysicalPlanner:
codecov-commenter commented on PR #1899:
URL:
https://github.com/apache/datafusion-comet/pull/1899#issuecomment-2984407095
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1899?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
andygrove opened a new pull request, #1903:
URL: https://github.com/apache/datafusion-comet/pull/1903
## Which issue does this PR close?
N/A
## Rationale for this change
We recently started to make QueryPlanSerde more maintainable by moving
expression ser
andygrove commented on code in PR #1903:
URL: https://github.com/apache/datafusion-comet/pull/1903#discussion_r2154774569
##
docs/source/user-guide/expressions.md:
##
@@ -127,7 +127,7 @@ The following Spark expressions are currently available.
Any known compatibility
| Log10
2010YOUY01 commented on PR #16268:
URL: https://github.com/apache/datafusion/pull/16268#issuecomment-2983191181
close and reopen to trigger CI again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
2010YOUY01 closed pull request #16268: Add compression option to SpillManager
URL: https://github.com/apache/datafusion/pull/16268
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment
SKY-ALIN opened a new issue, #1900:
URL: https://github.com/apache/datafusion-comet/issues/1900
### Describe the bug
This is what I get when I use `CAST(time AS TIMESTAMP)` as key
```shell
25/06/18 12:51:12 WARN CometExecRule: Comet cannot execute some parts of
this plan nat
SKY-ALIN opened a new pull request, #1901:
URL: https://github.com/apache/datafusion-comet/pull/1901
## Which issue does this PR close?
Closes #1900.
## Rationale for this change
This type is supported, but missed on the proto stage + message formatting
is incorr
adriangb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983932272
@Dandandan the two ways I thought a bloom filter would be advantageous:
1. More performant if applied to each row than the full hash table, although
I admit I haven't poked a
adriangb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983934401
Either way I think we can decouple the two things: there seems to be some
interest in adding a bloom filter expression, that can be developed in parallel
with the hash join pus
robert3005 opened a new issue, #16444:
URL: https://github.com/apache/datafusion/issues/16444
### Describe the bug
When upgrading to Datafusion 48 our continous benchmarking infra detected a
25x regression in Q6 and 10x in Q0 of Clickbench. This query was previously
answered all from
alamb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2984503937
> > We already have the full table in memory, so we can not really save
anything by compressing it into a bloom filter.
>
> Agreed: if we're not concerned with larger-than-m
adriangb opened a new pull request, #16445:
URL: https://github.com/apache/datafusion/pull/16445
Part of #7955.
My goal here is to lay the groundwork for pushing down joins.
I am only implementing bounds pushdown because I am sure that is cheap and
it will probably be quite effecti
adriangb commented on issue #7955:
URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2984574454
Took an initial stab at this in
https://github.com/apache/datafusion/pull/16445
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
2010YOUY01 merged PR #16437:
URL: https://github.com/apache/datafusion/pull/16437
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dat
2010YOUY01 closed pull request #16268: Add compression option to SpillManager
URL: https://github.com/apache/datafusion/pull/16268
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment
alamb commented on PR #1890:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1890#issuecomment-2983657757
π
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
xudong963 merged PR #16399:
URL: https://github.com/apache/datafusion/pull/16399
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
codecov-commenter commented on PR #1903:
URL:
https://github.com/apache/datafusion-comet/pull/1903#issuecomment-2984688644
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1903?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
nirnayroy commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2154383754
##
datafusion/functions/src/regex/regexpinstr.rs:
##
@@ -0,0 +1,804 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983856838
I had also played with building one with the `fastbloom` crate in the hash
join operator, but lacked the ability to push it anywhere useful in the plan,
which we now have.
mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983871673
> I can point to the relevant code for interest, but we may want a different
solution for core DF.
Maybe this code would at least make it easy for us to have a performa
mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983844232
So the high level for Spark is that thereβs a BloomFilterAgg aggregate
function that returns a byte sequence representing the bloom filter. The
BloomFilterMightContaim scalar
Dandandan commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983912966
I believe it should also be possible to share the `Arc` within
the created `PhysicalExpr`.
This avoids to build a bloom filter. We already have the full table in
memory
samuelcolvin commented on PR #14837:
URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2983627011
This would be extremely useful for us. @alamb please would this get merged π
.
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
SKY-ALIN commented on PR #1901:
URL:
https://github.com/apache/datafusion-comet/pull/1901#issuecomment-2983795662
It fixes formatting also, now it looks like this:
```shell
25/06/18 12:53:47 WARN CometExecRule: Comet cannot execute some parts of
this plan natively (set spark.comet
UBarney opened a new pull request, #16443:
URL: https://github.com/apache/datafusion/pull/16443
## Which issue does this PR close?
part of #16364
## Rationale for this change
see issue
## What changes are included in this PR?
1. Limit intermediate_batch Siz
andygrove commented on code in PR #1901:
URL: https://github.com/apache/datafusion-comet/pull/1901#discussion_r2154673214
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -2168,7 +2168,8 @@ object QueryPlanSerde extends Logging with CometExprShim {
andygrove commented on code in PR #1892:
URL: https://github.com/apache/datafusion-comet/pull/1892#discussion_r2154704183
##
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala:
##
@@ -232,6 +232,21 @@ class CometArrayExpressionSuite extends CometTestBase with
andygrove commented on code in PR #1892:
URL: https://github.com/apache/datafusion-comet/pull/1892#discussion_r2154723041
##
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala:
##
@@ -232,6 +232,21 @@ class CometArrayExpressionSuite extends CometTestBase with
andygrove opened a new issue, #1902:
URL: https://github.com/apache/datafusion-comet/issues/1902
### What is the problem the feature request solves?
Most of the array functions are marked as `IncompatExpr` and are disabled by
default. We should review whether this is necessary and imp
dependabot[bot] opened a new pull request, #16439:
URL: https://github.com/apache/datafusion/pull/16439
Bumps the proto group with 1 update:
[prost-build](https://github.com/tokio-rs/prost).
Updates `prost-build` from 0.13.5 to 0.14.1
Changelog
Sourced from https://github.co
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2154205143
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,363 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
findepi opened a new issue, #16442:
URL: https://github.com/apache/datafusion/issues/16442
### Is your feature request related to a problem or challenge?
This should work
```diff
$ git diff datafusion/sqllogictest/test_files/array.slt
diff --git datafusion/sqllogictest
mbutrovich commented on PR #1901:
URL:
https://github.com/apache/datafusion-comet/pull/1901#issuecomment-2984157981
Could we add a test case with timestamps as the join key?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
timsaucer opened a new pull request, #1156:
URL: https://github.com/apache/datafusion-python/pull/1156
# Which issue does this PR close?
Closes #1091
# Rationale for this change
This PR builds on top of
https://github.com/apache/datafusion-python/pull/1137 and adds pyt
timsaucer commented on PR #1156:
URL:
https://github.com/apache/datafusion-python/pull/1156#issuecomment-2984224547
FYI @renato2099 getting the python based providers ended up being a blocking
issue for some of my work so I took a stab at implementing it. Please tell me
what you think if y
comphead commented on code in PR #1903:
URL: https://github.com/apache/datafusion-comet/pull/1903#discussion_r2155251353
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -61,6 +61,39 @@ import org.apache.comet.shims.CometExprShim
* An utility object f
epgif commented on code in PR #16401:
URL: https://github.com/apache/datafusion/pull/16401#discussion_r2155260272
##
datafusion/catalog/src/schema.rs:
##
@@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send {
name: &str,
) -> Result>, DataFusionError>;
jonathanc-n commented on PR #16450:
URL: https://github.com/apache/datafusion/pull/16450#issuecomment-2985477515
I will try to run a benchmark on a table with smaller rows and return the
result when finished.
--
This is an automated message from the Apache Git Service.
To respond to the m
jonathanc-n opened a new pull request, #16450:
URL: https://github.com/apache/datafusion/pull/16450
## Which issue does this PR close?
- Closes #.
## Rationale for this change
We want to support equijoins in `NestedLoopJoin` in the case where one of
the tables in the
codecov-commenter commented on PR #1911:
URL:
https://github.com/apache/datafusion-comet/pull/1911#issuecomment-2986043661
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1911?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
parthchandra commented on code in PR #1911:
URL: https://github.com/apache/datafusion-comet/pull/1911#discussion_r2155736023
##
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##
@@ -1946,6 +1946,52 @@ class ParquetReadV1Suite extends ParquetReadSuite with
codecov-commenter commented on PR #1912:
URL:
https://github.com/apache/datafusion-comet/pull/1912#issuecomment-2986069982
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1912?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
andygrove commented on code in PR #1903:
URL: https://github.com/apache/datafusion-comet/pull/1903#discussion_r2155747372
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -61,6 +61,39 @@ import org.apache.comet.shims.CometExprShim
* An utility object
AdamGS commented on PR #16447:
URL: https://github.com/apache/datafusion/pull/16447#issuecomment-2986090162
Our benchmarks show this change fixes the performance regression we saw -
https://github.com/vortex-data/vortex/pull/3567
--
This is an automated message from the Apache Git Service
andygrove commented on code in PR #1910:
URL: https://github.com/apache/datafusion-comet/pull/1910#discussion_r2155751838
##
dev/diffs/3.5.6.diff:
##
@@ -1938,7 +1938,17 @@ index 8e88049f51e..d3c0737d52e 100644
import testImplicits._
// keep() should take effect on S
kazuyukitanimura commented on code in PR #1911:
URL: https://github.com/apache/datafusion-comet/pull/1911#discussion_r2155755872
##
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##
@@ -1946,6 +1946,52 @@ class ParquetReadV1Suite extends ParquetReadSuite w
kazuyukitanimura commented on code in PR #1910:
URL: https://github.com/apache/datafusion-comet/pull/1910#discussion_r2155756956
##
dev/diffs/3.5.6.diff:
##
@@ -1938,7 +1938,17 @@ index 8e88049f51e..d3c0737d52e 100644
import testImplicits._
// keep() should take effe
mbutrovich commented on PR #1907:
URL:
https://github.com/apache/datafusion-comet/pull/1907#issuecomment-2985229525
I think I'll need to generate new golden plans for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
timsaucer opened a new pull request, #1157:
URL: https://github.com/apache/datafusion-python/pull/1157
# **DO NOT MERGE**
This PR exists to make it easy to see the differences between `main` and our
current working branch, `rerun`.
--
This is an automated message from the Apach
Dandandan commented on PR #16433:
URL: https://github.com/apache/datafusion/pull/16433#issuecomment-2985955904
It seems in some cases it's faster:
```
ββββ³ββ³βββ³ββββ
β Queryβ topk-dynamic-filter β topk-filters β
adriangb commented on PR #16433:
URL: https://github.com/apache/datafusion/pull/16433#issuecomment-2985967718
Seems like a bug in my implementation right? I'd be surprised if the update
checks I added are that heavy compared to other work...
--
This is an automated message from the Apache
dfinninger opened a new issue, #1274:
URL: https://github.com/apache/datafusion-ballista/issues/1274
Hi, we're trying to make Ballista read parquet files in Google Cloud
Storage. It looks like support for GCS was added in 2023:
https://github.com/apache/datafusion-ballista/pull/805. However
parthchandra commented on PR #1901:
URL:
https://github.com/apache/datafusion-comet/pull/1901#issuecomment-2985979040
> Thanks for the contribution, @SKY-ALIN! Could we add a test case with
timestamps as the join key?
The test should have the left side and the right side timestamps b
milenkovicm commented on issue #1274:
URL:
https://github.com/apache/datafusion-ballista/issues/1274#issuecomment-2985986246
In short users should extend ballista to support object store they need. S3
is a bit special case.
You can find more details how to do that in the examples.
adriangb commented on PR #16371:
URL: https://github.com/apache/datafusion/pull/16371#issuecomment-2985997261
I'll try to review tomorrow.
I took a look the other day and my thought was that while it's complex code
that is a bit hard for me to fully wrap my head around it's well teste
parthchandra commented on code in PR #1892:
URL: https://github.com/apache/datafusion-comet/pull/1892#discussion_r2155694477
##
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala:
##
@@ -232,6 +232,21 @@ class CometArrayExpressionSuite extends CometTestBase wi
parthchandra opened a new pull request, #1911:
URL: https://github.com/apache/datafusion-comet/pull/1911
## Which issue does this PR close?
Adds a new unit test. Also adds a method to generate a complex type parquet
file that can be used to test various complex type cases.
--
This
blaginin commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2155673833
##
datafusion/functions/src/regex/regexpcount.rs:
##
@@ -29,10 +30,10 @@ use datafusion_expr::{
use datafusion_macros::user_doc;
use itertools::izip;
use regex
adriangb commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2985988638
> I think it makes sense to only filter on the shared hashmap and not
bothering with the min/max values - creating hashes and doing a single table
lookup is quite fast, so I think w
blaginin commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2155683469
##
datafusion/functions/src/regex/regexpinstr.rs:
##
@@ -0,0 +1,804 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lice
blaginin commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2155683469
##
datafusion/functions/src/regex/regexpinstr.rs:
##
@@ -0,0 +1,804 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lice
andygrove commented on PR #1910:
URL:
https://github.com/apache/datafusion-comet/pull/1910#issuecomment-2986017103
One test failure, as expected:
```
2025-06-18T22:31:07.6082754Z [info] - SPARK-17091: Convert IN predicate to
Parquet filter push-down *** FAILED *** (297 millisecond
miclegr commented on code in PR #1154:
URL:
https://github.com/apache/datafusion-python/pull/1154#discussion_r2155423158
##
python/datafusion/context.py:
##
@@ -535,7 +535,7 @@ def register_listing_table(
self,
name: str,
path: str | pathlib.Path,
-
1 - 100 of 207 matches
Mail list logo