On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vyacheslav.zholu...@gmail.com> wrote:
>Yep, I saw that method as well as the stackoverflow post. However, I'm >interested how to append to a file on the arbitrary file system, not only >on the local one. > >I want to get an OutputStream based on the Path and the FileSystem >implementation and then pass it for appending to avro methods. > >Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp ut.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. > >Thanks, >Vyacheslav > >On Feb 21, 2012, at 5:29 AM, Harsh J wrote: > >> Hi, >> >> Use the appendTo feature of the DataFileWriter. See >> >>http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileW >>riter.html#appendTo(java.io.File) >> >> For a quick setup example, read also: >> >>http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-exis >>ting-avro-data-file >> >> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev >> <vyacheslav.zholu...@gmail.com> wrote: >>> Hi, >>> >>> is it possible to append to an already existing avro file when it was >>> written and closed before? >>> >>> If I use >>> outputStream = fs.append(avroFilePath); >>> >>> then later on I get: java.io.IOException: Invalid sync! >>> >>> Probably because the schema is written twice and some other issues. >>> >>> If I use outputStream = fs.create(avroFilePath); then the avro file >>>gets >>> overwritten. >>> >>> Thanks, >>> Vyacheslav >> >> >> >> -- >> Harsh J >> Customer Ops. Engineer >> Cloudera | http://tiny.cloudera.com/about >