Martin said it already, but I will emphasize: Avro data files are splittable and can support multiple mappers no matter what codec is used for compression. This is because avro files are block based, and only use the compression within the block. I recommend starting with gzip compression, and moving to snappy only if deflate compression level '1' is not fast enough.
For more information on avro data files, see: http://avro.apache.org/docs/current/spec.html#Object+Container+Files On 4/22/13 11:47 PM, "nir_zamir" <nir.za...@gmail.com> wrote: >Thanks Martin. > >What will happen if I try to use an indexed LZO-compressed avro file? Will >it work and utilize the index to allow multiple mappers? > >I think that for Snappy for example, the file is splittable and can use >multiple mappers, but I haven't tested it yet - would be glad if anyone >has >any experience with that. > >Thanks! >Nir. > > > >-- >View this message in context: >http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp40 >26947p4027009.html >Sent from the Avro - Users mailing list archive at Nabble.com.