It depends on your usage (when and how u read).
the smaller files you were thinking about are also larger than the HDFS
block size?
I would not go for something smaller than a block.

Usually (if relevant to the way you read the data) the partitioning helps
determine that.

Yaniv Harpaz
[ yaniv.harpaz at gmail.com ]


On Mon, Nov 4, 2019 at 7:03 PM Sam <games2013....@gmail.com> wrote:

> Hi,
>
> How do we choose between single large avro file (size much larger than
> HDFS block size) vs multiple smaller avro files (close to HDFS block size?
>
> Since avro is splittable, is there even a need to split a very large avro
> file into smaller files?
>
> I’m assuming that a single large avro file can also be split into multiple
> mappers/reducers/executors during processing.
>
> Thanks.
>

Reply via email to