Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user PierreZ added a comment to the discussion: Indexing Support in DataFusion? Thanks @adriangb for your comment and your questions! GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-15015003 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user adriangb added a comment to the discussion: Indexing Support in DataFusion? I'll take a look out of interest! My feeling (maybe misguided) is that since there is currently nothing in the repo if it's generally good quality code I think the bar to merge it is pretty low, it's not worth blocking on API design discussion, etc. GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-15011931 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user PierreZ added a comment to the discussion: Indexing Support in DataFusion? Hey everyone! 👋 Quick update, I've finally completed the initial implementation of the index provider we discussed: https://github.com/datafusion-contrib/datafusion-index-provider/pull/2 It implements the "Option 2" approach (APIs to pass additional knowledge about indexes) that @alamb mentioned above. The crate provides: - Index-based query acceleration for `TableProvider` implementations - Automatic handling of complex predicates (AND/OR/multiple indexes) - Clean trait-based API (`Index`, `RecordFetcher`, `IndexedTableProvider`) This has been running at my company for a few months without issues on top of FoundationDB. The design is somewhat oriented toward small queries and low data volumes due to FoundationDB's 5s transaction timeout and 10MB transaction limits. That said, I'd love feedback, especially on whether the approach makes sense for larger-scale scenarios. I don't work with query planners often and there are probably better ways to structure some of this. **Since this is landing in the datafusion-contrib organization:** - Who would be the right person(s) to review this PR? - What are the general contribution/review guidelines for datafusion-contrib repos? GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-15011862 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user Epicism added a comment to the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210286 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user Epicism deleted a comment on the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user Epicism added a comment to the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [D] Indexing Support in DataFusion? [datafusion]
GitHub user PierreZ added a comment to the discussion: Indexing Support in DataFusion? It took me way too much time to have time to work on this(a full year :see_no_evil:), but I have some experimental design published on a [branch](https://github.com/PierreZ/datafusion-index-provider/tree/init-v2). Before opening the MR, I will integrate it on our software first to validate the API. GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13189642 This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
