Hello all,
I have a question regarding batch indexing. As as I can see, data are stored in json format in hdfs. Nevertheless, this uses a lot of storage because of json verbosity, enrichment,.. Is there any way to use parquet for example? I guess its possible to do it the day after, I mean you read the json and with spark you save as another format, but is it possible to choose the format at the batch indexing configuration level? Thanks a lot Stéphane
smime.p7s
Description: S/MIME cryptographic signature
