having a file per record is pretty inefficient on almost any file system

El martes, 22 de septiembre de 2015, Daniel Haviv <
daniel.ha...@veracity-group.com> escribió:

> Hi,
> We are trying to load around 10k avro files (each file holds only one
> record) using spark-avro but it takes over 15 minutes to load.
> It seems that most of the work is being done at the driver where it
> created a broadcast variable for each file.
>
> Any idea why is it behaving that way ?
> Thank you.
> Daniel
>

Reply via email to