Re: [D] Indexing Support in DataFusion? [datafusion]

2025-11-19 Thread via GitHub


GitHub user PierreZ added a comment to the discussion: Indexing Support in 
DataFusion?

Thanks @adriangb for your comment and your questions!

GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-15015003


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-11-19 Thread via GitHub


GitHub user adriangb added a comment to the discussion: Indexing Support in 
DataFusion?

I'll take a look out of interest!  My feeling (maybe misguided) is that since 
there is currently nothing in the repo if it's generally good quality code I 
think the bar to merge it is pretty low, it's not worth blocking on API design 
discussion, etc.

GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-15011931


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-11-19 Thread via GitHub


GitHub user PierreZ added a comment to the discussion: Indexing Support in 
DataFusion?

Hey everyone! 👋 

Quick update, I've finally completed the initial implementation of the index 
provider we discussed: 
https://github.com/datafusion-contrib/datafusion-index-provider/pull/2

It implements the "Option 2" approach (APIs to pass additional knowledge about 
indexes) that @alamb mentioned above. The crate provides:
- Index-based query acceleration for `TableProvider` implementations
- Automatic handling of complex predicates (AND/OR/multiple indexes)
- Clean trait-based API (`Index`, `RecordFetcher`, `IndexedTableProvider`)

This has been running at my company for a few months without issues on top of 
FoundationDB. The design is somewhat oriented toward small queries and low data 
volumes due to FoundationDB's 5s transaction timeout and 10MB transaction 
limits. That said, I'd love feedback, especially on whether the approach makes 
sense for larger-scale scenarios. I don't work with query planners often and 
there are probably better ways to structure some of this.

**Since this is landing in the datafusion-contrib organization:**
- Who would be the right person(s) to review this PR?
- What are the general contribution/review guidelines for datafusion-contrib 
repos?

GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-15011862


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub


GitHub user Epicism added a comment to the discussion: Indexing Support in 
DataFusion?

This is amazing! I can't wait to go through this. 

GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210286


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub


GitHub user Epicism deleted a comment on the discussion: Indexing Support in 
DataFusion?

This is amazing! I can't wait to go through this.

> Message ID:  github.com>
>


GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub


GitHub user Epicism added a comment to the discussion: Indexing Support in 
DataFusion?

This is amazing! I can't wait to go through this.

> Message ID:  github.com>
>


GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-18 Thread via GitHub


GitHub user PierreZ added a comment to the discussion: Indexing Support in 
DataFusion?

It took me way too much time to have time to work on this(a full year 
:see_no_evil:), but I have some experimental design published on a 
[branch](https://github.com/PierreZ/datafusion-index-provider/tree/init-v2). 
Before opening the MR, I will integrate it on our software first to validate 
the API.

GitHub link: 
https://github.com/apache/datafusion/discussions/9963#discussioncomment-13189642


This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]