Re: [EXT] Re: Bucketing in Hudi

2020-10-22 Thread Balaji Varadarajan
Hi Roopa, Bucketing is a more general concept. I think what you are referring to is how to integrate with spark sql bucketing syntax.  I was proposing a Hudi native solution where we can implement Bucket indexing which gives the same end result of writing compacted (parquet) files with keys

Re: [EXT] Re: Bucketing in Hudi

2020-10-22 Thread Roopa Murthy
Hi Balaji, Thanks for your response. I went through HoodieIndex in source code but I am not sure how indexing alone could help with bucketing. Spark Bucketing would involve writing the compacted files in bucketed/clustered fashion such that when a spark sql query has a certain id, only the