Hi, all

I have some IIS log files whose format depends on "#Fields" line inside the
log, which make the file not splitable and not suitable for MR job. So I
want to preprocess the files to Avro files. It's simple and fast to
transform each line to an Avro record, but the serialization and
compression is too slow.

Is there a way that the serialize and compress in parallel, while write
sequentially? In principle I could even split the records to several files,
which could serialize and compress in parallel, but I can't find a way to
combine them.

 any suggestions? Thanks!

Reply via email to