Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-07 Thread via GitHub


alamb commented on PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3045304111

   I went over this again and messed around with the wording but not the 
content. I also made the conclusion a bit stronger and made the wording a bit 
more concise
   
   I'll plan to publish this next Monday 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub


JigaoLuo commented on PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041731900

   Have my final pass. It looks very nice :fire: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub


zhuqi-lucas commented on PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041575868

   > I pushed some non trivial changes to this blog:
   > 
   > 1. Added @JigaoLuo  as an author (hope this is ok @zhuqi-lucas )
   > 2. Added a section with a high level overview of adding user defined 
indexes
   > 3. Focused the example section on reading/writing the index and 
integrating it into DataFusion
   
   Thank you @alamb  @JigaoLuo , i am very excited to see this blog getting 
perfect with your help! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Blog :Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub


alamb commented on code in PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2188284374


##
content/blog/2025-07-07-user-defined-parquet-indexes.md:
##
@@ -0,0 +1,542 @@
+---
+layout: post
+title: Extending Apache Parquet with User Defined Indexes to Accelerate Query 
Processing with DataFusion
+date: 2025-07-07
+author: Qi Zhu, Jigao Luo, and Andrew Lamb
+categories: [features]
+---
+
+
+It’s a common misconception that [Apache Parquet] files can only store basic 
Min/Max/Null Count statistics and Bloom filters, and that adding anything 
"smarter" requires a change to the specification or an entirely new file 
format. In fact, footer metadata and offset based addressing already provide 
everything needed to embed user defined index structures within Parquet Files 
without breaking compatibility with other Parquet readers. 

Review Comment:
   FYI I am very pleased with this intro as I think it succinctly summarizes 
the value of this blog and will get people excited
   
   https://github.com/user-attachments/assets/a86a5d3f-c90e-426a-b75d-493b78adb1da";
 />
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]