My understanding is that will append to a file on the local filesystem, but not to a file on HDFS.
--- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> wrote: > From: Doug Cutting <cutt...@apache.org> > Subject: Re: Is it possible to append to an already existing avro file > To: user@avro.apache.org > Date: Tuesday, February 5, 2013, 5:08 PM > The Jira is: > > https://issues.apache.org/jira/browse/AVRO-1035 > > It is possible to append to an existing Avro file: > > http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > > Should we close that issue as "fixed"? > > Doug > > On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com> > wrote: > > Was a JIRA ticket ever created regarding appending to > an existing Avro file on HDFS? > > > > What is the status of such a capability, a year out > from when the issue below was raised? > > > > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav > Zholudev" <vyacheslav.zholu...@gmail.com> > wrote: > > > >> Thanks for your reply, I suspected this. > >> > >> I will create a JIRA ticket. > >> > >> Vyacheslav > >> > >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: > >> > >>> > >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" > <vyacheslav.zholu...@gmail.com> > >>> wrote: > >>> > >>>> Yep, I saw that method as well as the > stackoverflow post. However, I'm > >>>> interested how to append to a file on the > arbitrary file system, not > >>>> only on the local one. > >>>> > >>>> I want to get an OutputStream based on the > Path and the FileSystem > >>>> implementation and then pass it for > appending to avro methods. > >>>> > >>>> Is that possible? > >>> > >>> It is not possible without modifying > DataFileWriter. Please open a JIRA > >>> ticket. > >>> > >>> It could not simply append to an OutputStream, > since it must either: > >>> * Seek to the start to validate the schemas > match and find the sync > >>> marker, or > >>> * Trust that the schemas match and find the > sync marker from the last > >>> block > >>> > >>> DataFileWriter cannot refer to Hadoop classes > such as FileSystem, but we > >>> could add something to the mapred module that > takes a Path and > >>> FileSystem and returns something that > implemements an interface that > >>> DataFileWriter can append to. This would > be something that is both a > >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >>> and an OutputStream, or has both an InputStream > from the start of the > >>> existing file and an OutputStream at the end. > >>> > >>>> Thanks, > >>>> Vyacheslav > >>>> > >>>> On Feb 21, 2012, at 5:29 AM, Harsh J > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> Use the appendTo feature of the > DataFileWriter. See > >>>>> > >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >>>>> > >>>>> For a quick setup example, read also: > >>>>> > >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >>>>> > >>>>> On Tue, Feb 21, 2012 at 3:15 AM, > Vyacheslav Zholudev > >>>>> <vyacheslav.zholu...@gmail.com> > wrote: > >>>>>> Hi, > >>>>>> > >>>>>> is it possible to append to an > already existing avro file when it was > >>>>>> written and closed before? > >>>>>> > >>>>>> If I use > >>>>>> outputStream = > fs.append(avroFilePath); > >>>>>> > >>>>>> then later on I get: > java.io.IOException: Invalid sync! > >>>>>> > >>>>>> Probably because the schema is > written twice and some other issues. > >>>>>> > >>>>>> If I use outputStream = > fs.create(avroFilePath); then the avro file > >>>>>> gets > >>>>>> overwritten. > >>>>>> > >>>>>> Thanks, > >>>>>> Vyacheslav > >>>>> > >>>>> -- > >>>>> Harsh J > >>>>> Customer Ops. Engineer > >>>>> Cloudera | http://tiny.cloudera.com/about > > > > On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com> > wrote: > > Was a JIRA ticket ever created regarding appending to > an existing Avro file on HDFS? > > > > What is the status of such a capability, a year out > from when the issue below was raised? > > > > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav > Zholudev" <vyacheslav.zholu...@gmail.com> > wrote: > > > >> Thanks for your reply, I suspected this. > >> > >> I will create a JIRA ticket. > >> > >> Vyacheslav > >> > >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: > >> > >>> > >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" > <vyacheslav.zholu...@gmail.com> > >>> wrote: > >>> > >>>> Yep, I saw that method as well as the > stackoverflow post. However, I'm > >>>> interested how to append to a file on the > arbitrary file system, not > >>>> only on the local one. > >>>> > >>>> I want to get an OutputStream based on the > Path and the FileSystem > >>>> implementation and then pass it for > appending to avro methods. > >>>> > >>>> Is that possible? > >>> > >>> It is not possible without modifying > DataFileWriter. Please open a JIRA > >>> ticket. > >>> > >>> It could not simply append to an OutputStream, > since it must either: > >>> * Seek to the start to validate the schemas > match and find the sync > >>> marker, or > >>> * Trust that the schemas match and find the > sync marker from the last > >>> block > >>> > >>> DataFileWriter cannot refer to Hadoop classes > such as FileSystem, but we > >>> could add something to the mapred module that > takes a Path and > >>> FileSystem and returns something that > implemements an interface that > >>> DataFileWriter can append to. This would > be something that is both a > >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >>> and an OutputStream, or has both an InputStream > from the start of the > >>> existing file and an OutputStream at the end. > >>> > >>>> Thanks, > >>>> Vyacheslav > >>>> > >>>> On Feb 21, 2012, at 5:29 AM, Harsh J > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> Use the appendTo feature of the > DataFileWriter. See > >>>>> > >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >>>>> > >>>>> For a quick setup example, read also: > >>>>> > >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >>>>> > >>>>> On Tue, Feb 21, 2012 at 3:15 AM, > Vyacheslav Zholudev > >>>>> <vyacheslav.zholu...@gmail.com> > wrote: > >>>>>> Hi, > >>>>>> > >>>>>> is it possible to append to an > already existing avro file when it was > >>>>>> written and closed before? > >>>>>> > >>>>>> If I use > >>>>>> outputStream = > fs.append(avroFilePath); > >>>>>> > >>>>>> then later on I get: > java.io.IOException: Invalid sync! > >>>>>> > >>>>>> Probably because the schema is > written twice and some other issues. > >>>>>> > >>>>>> If I use outputStream = > fs.create(avroFilePath); then the avro file > >>>>>> gets > >>>>>> overwritten. > >>>>>> > >>>>>> Thanks, > >>>>>> Vyacheslav > >>>>> > >>>>> -- > >>>>> Harsh J > >>>>> Customer Ops. Engineer > >>>>> Cloudera | http://tiny.cloudera.com/about > > >