It depends on your usage (when and how u read). the smaller files you were thinking about are also larger than the HDFS block size? I would not go for something smaller than a block.
Usually (if relevant to the way you read the data) the partitioning helps determine that. Yaniv Harpaz [ yaniv.harpaz at gmail.com ] On Mon, Nov 4, 2019 at 7:03 PM Sam <games2013....@gmail.com> wrote: > Hi, > > How do we choose between single large avro file (size much larger than > HDFS block size) vs multiple smaller avro files (close to HDFS block size? > > Since avro is splittable, is there even a need to split a very large avro > file into smaller files? > > I’m assuming that a single large avro file can also be split into multiple > mappers/reducers/executors during processing. > > Thanks. >