Hello groups, I am running some comparison tests on a data set that I converted to avro with deflator set to level 6. The original logs consists of 2880 uncompressed http access logs with a total size of 1.4TB. The Compressed avro log is about 2/3 of the size. However, when I ran the same pig job on the raw logs, it is blazing fast during the initial map phase. Finished in under 40 min. When I ran the same pig job with avro files, the initial map phase took 8 minutes to only finish 10%. I am wondering is there any way to figure out what is slowing down the map?
Thanks, Felix