Re: bloomfilter and tokenisation

2019-06-12 Thread Wes McKinney
Hi Manik, You could store "raw" as a LIST (so you have to tokenize in your ETL step) instead of BYTE_ARRAY and you then reap dictionary encoding benefits. - Wes On Wed, Jun 12, 2019 at 12:08 PM Manik Singla wrote: > > could someone guide on this one > > Regards > Manik Singla > +91-9996008893 >

Re: bloomfilter and tokenisation

2019-06-12 Thread Manik Singla
could someone guide on this one Regards Manik Singla +91-9996008893 +91-9665639677 "Life doesn't consist in holding good cards but playing those you hold well." On Tue, Jun 11, 2019 at 5:58 PM Manik Singla wrote: > Hey Team > > I have started using parquet recently. > > Kind of data I save is

bloomfilter and tokenisation

2019-06-11 Thread Manik Singla
Hey Team I have started using parquet recently. Kind of data I save is something like *raw hostname cluster serviceName * where raw is actual log lines. For raw, dictionary doesn't work as we no 2 log lines are same. But if we tokenise terms in dictionary, then dictionary can help here to f