Re: Inline storage of parquet data in logs

2019-10-24 Thread Vinoth Chandar
yes. although I think we can also store the bloom filter in log block footer, this way if we want to log avro we can still do it. And indexing would work. I plan to explain this clearly in an RFC. yes. we will definitely add other metadata into the headers :) Our log format is pretty flexible.

Re: Inline storage of parquet data in logs

2019-10-23 Thread Kabeer Ahmed
Vinoth, Thanks for clarification. :-). I looked at the email from a periphery without getting into details. I will review it thoroughly in few days and catch up. Thanks, On Oct 23 2019, at 3:06 pm, Vinoth Chandar wrote: > yes. to the append log, that is used for compaction. My guess was

Re: Inline storage of parquet data in logs

2019-10-23 Thread Vinoth Chandar
yes. to the append log, that is used for compaction. My guess was Kabeer's concern was around actually sending user data into debug logs (slf4j/log4j) which we dont. On the second part. yes, we want to option to write parquet data inline instead of avro. Once we harden this, any other format e.g

Re: Inline storage of parquet data in logs

2019-10-23 Thread Vinoth Chandar
Sure. Take your time! Just to clarify, here log refers to the Hudi append log, not user's log4j or such logs. yes that would be very strange to do. :) On Wed, Oct 23, 2019 at 3:06 AM Kabeer Ahmed wrote: > Hi Vinoth, > > Have crazy week and the next 2 to 3 weeks are going to be very busy. I >

Re: Inline storage of parquet data in logs

2019-10-23 Thread Kabeer Ahmed
Hi Vinoth, Have crazy week and the next 2 to 3 weeks are going to be very busy. I havent had a chance to look into this. My thoughts are around security. The ideas of building external indexes come with loads of advantages and throwing user data into the logs etc makes me anxious. Let me do a

Re: Inline storage of parquet data in logs

2019-10-21 Thread Vinoth Chandar
Any thoughts? :) anyone? On Wed, Oct 9, 2019 at 11:06 AM Vinoth Chandar wrote: > Hi all, > > Wanted to share some prototyping I was doing for HUDI-46. The idea here is > to see if we can embed a parquet file "inline" into an outer file (our > log), so that if the user chooses to they can also

Inline storage of parquet data in logs

2019-10-09 Thread Vinoth Chandar
Hi all, Wanted to share some prototyping I was doing for HUDI-46. The idea here is to see if we can embed a parquet file "inline" into an outer file (our log), so that if the user chooses to they can also get parquet data in the logs to speed up real-time view queries. We would be using the