Hi, all I have some IIS log files whose format depends on "#Fields" line inside the log, which make the file not splitable and not suitable for MR job. So I want to preprocess the files to Avro files. It's simple and fast to transform each line to an Avro record, but the serialization and compression is too slow.
Is there a way that the serialize and compress in parallel, while write sequentially? In principle I could even split the records to several files, which could serialize and compress in parallel, but I can't find a way to combine them. any suggestions? Thanks!