Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-11-14 Thread Vinoth Chandar
Since attachments don't really work on the mailing list, Can you may be attach them to comments on the RFC itself? In this scenario, we will get a larger range than is probably in the newly compacted base file, correct? Current thinking is, yes it will lead to less efficient pruning by ranges,

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-11-14 Thread Sivabalan
I have s doubt on the design. I guess this is the right place to discuss. I want to understand how compaction interplays with this new scheme. Let's assume all log block are of new format only. Once compaction completes, those log blocks/files not compacted will have range info pertaining to

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-10-30 Thread Nishith
Thanks for the detailed design write up Vinoth. I concur with the others on option 2, default indexing as off and enable it when we have enough confidence on stability & performance. Although, I do think practically it might be good to have the code in place for users who might revert to an

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-10-30 Thread Balaji Varadarajan
Thanks Vinoth for proposing a clean and extendable design. The overall design looks great. Another rollout option is to only use consolidated log index for index lookup if latest "valid" log block has been written in new format. If that is not the case, we can revert to scanning previous log

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-10-29 Thread Bhavani Sudha
I vote for the second option. Also it can give time to analyze on how to deal with backwards compatibility. I ll take a look at the RFC later tonight and get back. On Sun, Oct 27, 2019 at 10:24 AM Vinoth Chandar wrote: > One issue I have some open questions myself > > Is it ok to assume log

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-10-27 Thread Vinoth Chandar
One issue I have some open questions myself Is it ok to assume log will have old data block versions, followed by new data block versions. For e.g, if rollout new code, then revert back then there could be an arbitrary mix of new and old data blocks. Handling this might make design/code fairly

DISCUSS RFC 6 - Add indexing support to the log file

2019-10-27 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/RFC-6+Add+indexing+support+to+the+log+file Feedback welcome, on this RFC tackling HUDI-86