Martin said it already, but I will emphasize:

Avro data files are splittable and can support multiple mappers no matter
what codec is used for compression.  This is because avro files are block
based, and only use the compression within the block.  I recommend
starting with gzip compression, and moving to snappy only if deflate
compression level '1' is not fast enough.

For more information on avro data files, see:
http://avro.apache.org/docs/current/spec.html#Object+Container+Files



On 4/22/13 11:47 PM, "nir_zamir" <nir.za...@gmail.com> wrote:

>Thanks Martin.
>
>What will happen if I try to use an indexed LZO-compressed avro file? Will
>it work and utilize the index to allow multiple mappers?
>
>I think that for Snappy for example, the file is splittable and can use
>multiple mappers, but I haven't tested it yet - would be glad if anyone
>has
>any experience with that.
>
>Thanks!
>Nir.
>
>
>
>--
>View this message in context:
>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp40
>26947p4027009.html
>Sent from the Avro - Users mailing list archive at Nabble.com.


Reply via email to