Hi Manik,
You could store "raw" as a LIST (so you have to tokenize
in your ETL step) instead of BYTE_ARRAY and you then reap dictionary
encoding benefits.
- Wes
On Wed, Jun 12, 2019 at 12:08 PM Manik Singla wrote:
>
> could someone guide on this one
>
> Regards
> Manik Singla
> +91-9996008893
>
could someone guide on this one
Regards
Manik Singla
+91-9996008893
+91-9665639677
"Life doesn't consist in holding good cards but playing those you hold
well."
On Tue, Jun 11, 2019 at 5:58 PM Manik Singla wrote:
> Hey Team
>
> I have started using parquet recently.
>
> Kind of data I save is
Hey Team
I have started using parquet recently.
Kind of data I save is something like
*raw hostname cluster serviceName *
where raw is actual log lines.
For raw, dictionary doesn't work as we no 2 log lines are same. But if we
tokenise terms in dictionary, then dictionary can help here to f