having a file per record is pretty inefficient on almost any file system El martes, 22 de septiembre de 2015, Daniel Haviv < daniel.ha...@veracity-group.com> escribió:
> Hi, > We are trying to load around 10k avro files (each file holds only one > record) using spark-avro but it takes over 15 minutes to load. > It seems that most of the work is being done at the driver where it > created a broadcast variable for each file. > > Any idea why is it behaving that way ? > Thank you. > Daniel >