> I have the following C code - https://gist.github.com/967968 > When I ran it on a 100000 records file, it says 100030. (Both C and > Python implementation count 10000). > > What am I doing wrong?
You found a bug in the C library's file reader code; I've opened up a bug report for it: https://issues.apache.org/jira/browse/AVRO-819 The problem is that the file reader code isn't propagating errors correctly up through the call stack; which makes avro_file_reader_read not detect EOF; which makes you loop through the final block of the file twice. That's where the extra 30 records in your count comes from — in the file you're reading, the final block must contain 30 records. I've got a patch ready for this; I'll test on a couple of platforms and then commit it to Subversion. cheers –doug
PGP.sig
Description: This is a digitally signed message part