Hello, I'm writing some code to split Avro datafiles into smaller files, and for one of the approaches I was attempting to read blocks from a DataFileStream, and for each block call appendEncoded on a DataFileWriter until a certain number of blocks have been written, then start a new writer and keep going until every block was transferred to one of the smaller files.
In a test case it appears to have appended all of the blocks with no exceptions, but when attempting to read in the resulting data, after reading the first record it produces: org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) This particular test case has an input datafile with 1 block of 100 records, and it was looking to split once it has seen 200 records, so it ends up producing one new datafile which should equivalent to the input. I'm not that familiar with the low-level internals of Avro so I was wondering is there anything I am missing that I should be doing when appending the blocks? This ticket seems to be a similar issue: https://issues.apache.org/jira/browse/AVRO-1093 but after looking at that it didn't lead me to see anything wrong with my approach. Any pointers would be appreciated, thanks. I can provide a rough outline of the code if it helps. -Bryan