Hello all,

 

I have a question regarding batch indexing. As as I can see, data are stored
in json format in hdfs. Nevertheless, this uses a lot of storage because of
json verbosity, enrichment,.. Is there any way to use parquet for example? I
guess it’s possible to do it the day after, I mean you read the json and
with spark you save as another format, but is it possible to choose the
format at the batch indexing configuration level?

 

Thanks a lot

 

Stéphane

 

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to