Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3045304111 I went over this again and messed around with the wording but not the content. I also made the conclusion a bit stronger and made the wording a bit more concise I'll plan to publish this next Monday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]
JigaoLuo commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041731900 Have my final pass. It looks very nice :fire: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041575868 > I pushed some non trivial changes to this blog: > > 1. Added @JigaoLuo as an author (hope this is ok @zhuqi-lucas ) > 2. Added a section with a high level overview of adding user defined indexes > 3. Focused the example section on reading/writing the index and integrating it into DataFusion Thank you @alamb @JigaoLuo , i am very excited to see this blog getting perfect with your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Blog :Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]
alamb commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2188284374 ## content/blog/2025-07-07-user-defined-parquet-indexes.md: ## @@ -0,0 +1,542 @@ +--- +layout: post +title: Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion +date: 2025-07-07 +author: Qi Zhu, Jigao Luo, and Andrew Lamb +categories: [features] +--- + + +It’s a common misconception that [Apache Parquet] files can only store basic Min/Max/Null Count statistics and Bloom filters, and that adding anything "smarter" requires a change to the specification or an entirely new file format. In fact, footer metadata and offset based addressing already provide everything needed to embed user defined index structures within Parquet Files without breaking compatibility with other Parquet readers. Review Comment: FYI I am very pleased with this intro as I think it succinctly summarizes the value of this blog and will get people excited https://github.com/user-attachments/assets/a86a5d3f-c90e-426a-b75d-493b78adb1da"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
