While I haven't benchmarked java performance I have looked closely at Ruby vs C with regards to reading large avro files. With C - I have processed ~900Mb files with 25+M rows in ~42s. And routinely process 270Mb / 7.5M record files with C, on average, in 15s. These numbers were observed running on a Mac Book Pro 2012 model (exact specs elude me at the moment). Not scientific but may help give you a ballpark of what is possible. I am using Java. I did play with the size of the buffer reader, but I found that the default size of 8K gave me the best performance. thanks, Yael
On Fri, May 23, 2014 at 4:14 AM, Martin Kleppmann <mkleppm...@linkedin.com>wrote: > Which language are you using? Afaik, most language implementations of Avro > only have an interface for reading one record at a time, but they do buffer > the input file internally, so there shouldn't be a performance disadvantage > to reading one record at a time. > > If you have an example that is particularly slow, you could be a great > help to the Avro community by getting out a profiler and finding the > bottleneck :) > > Thanks, > Martin > > On 14 May 2014, at 20:13, yael aharon <yael.aharo...@gmail.com> wrote: > > I am building a java utility that reads large AVRO files and does some > processing. These files have millions of records in them and it can take > minutes to read them using DataFileReader.next(). > > Is there a way to read more than one record at a time? > > thanks, Yael > >