Thanks so much for the code -- it works great! Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
--- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote: > From: Harsh J <ha...@cloudera.com> > Subject: Re: Is it possible to append to an already existing avro file > To: user@avro.apache.org > Date: Wednesday, February 6, 2013, 11:17 AM > Hey Michael, > > It does implement the regular Java OutputStream interface, > as seen in > the API: > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. > > Here's a sample program that works on Hadoop 2.x in my > tests: > https://gist.github.com/QwertyManiac/4724582 > > On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> > wrote: > > I don't believe a Hadoop FileSystem is a Java > OutputStream? > > > > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > wrote: > > > >> From: Doug Cutting <cutt...@apache.org> > >> Subject: Re: Is it possible to append to an already > existing avro file > >> To: user@avro.apache.org > >> Date: Tuesday, February 5, 2013, 5:27 PM > >> It will work on an OutputStream that > >> supports append. > >> > >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, > >> java.io.OutputStream) > >> > >> So it depends on how well HDFS implements > >> FileSystem#append(), not on > >> any changes in Avro. > >> > >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) > >> > >> I have no recent personal experience with append > in > >> HDFS. Does anyone > >> else here? > >> > >> Doug > >> > >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak > <michaelma...@yahoo.com> > >> wrote: > >> > My understanding is that will append to a file > on the > >> local filesystem, but not to a file on HDFS. > >> > > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > >> wrote: > >> > > >> >> From: Doug Cutting <cutt...@apache.org> > >> >> Subject: Re: Is it possible to append to > an already > >> existing avro file > >> >> To: user@avro.apache.org > >> >> Date: Tuesday, February 5, 2013, 5:08 PM > >> >> The Jira is: > >> >> > >> >> https://issues.apache.org/jira/browse/AVRO-1035 > >> >> > >> >> It is possible to append to an existing > Avro file: > >> >> > >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> > >> >> Should we close that issue as "fixed"? > >> >> > >> >> Doug > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> > > > > -- > Harsh J > > On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> > wrote: > > I don't believe a Hadoop FileSystem is a Java > OutputStream? > > > > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > wrote: > > > >> From: Doug Cutting <cutt...@apache.org> > >> Subject: Re: Is it possible to append to an already > existing avro file > >> To: user@avro.apache.org > >> Date: Tuesday, February 5, 2013, 5:27 PM > >> It will work on an OutputStream that > >> supports append. > >> > >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, > >> java.io.OutputStream) > >> > >> So it depends on how well HDFS implements > >> FileSystem#append(), not on > >> any changes in Avro. > >> > >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) > >> > >> I have no recent personal experience with append > in > >> HDFS. Does anyone > >> else here? > >> > >> Doug > >> > >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak > <michaelma...@yahoo.com> > >> wrote: > >> > My understanding is that will append to a file > on the > >> local filesystem, but not to a file on HDFS. > >> > > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > >> wrote: > >> > > >> >> From: Doug Cutting <cutt...@apache.org> > >> >> Subject: Re: Is it possible to append to > an already > >> existing avro file > >> >> To: user@avro.apache.org > >> >> Date: Tuesday, February 5, 2013, 5:08 PM > >> >> The Jira is: > >> >> > >> >> https://issues.apache.org/jira/browse/AVRO-1035 > >> >> > >> >> It is possible to append to an existing > Avro file: > >> >> > >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> > >> >> Should we close that issue as "fixed"? > >> >> > >> >> Doug > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> > > > > -- > Harsh J > > On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> > wrote: > > I don't believe a Hadoop FileSystem is a Java > OutputStream? > > > > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > wrote: > > > >> From: Doug Cutting <cutt...@apache.org> > >> Subject: Re: Is it possible to append to an already > existing avro file > >> To: user@avro.apache.org > >> Date: Tuesday, February 5, 2013, 5:27 PM > >> It will work on an OutputStream that > >> supports append. > >> > >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, > >> java.io.OutputStream) > >> > >> So it depends on how well HDFS implements > >> FileSystem#append(), not on > >> any changes in Avro. > >> > >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) > >> > >> I have no recent personal experience with append > in > >> HDFS. Does anyone > >> else here? > >> > >> Doug > >> > >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak > <michaelma...@yahoo.com> > >> wrote: > >> > My understanding is that will append to a file > on the > >> local filesystem, but not to a file on HDFS. > >> > > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > >> wrote: > >> > > >> >> From: Doug Cutting <cutt...@apache.org> > >> >> Subject: Re: Is it possible to append to > an already > >> existing avro file > >> >> To: user@avro.apache.org > >> >> Date: Tuesday, February 5, 2013, 5:08 PM > >> >> The Jira is: > >> >> > >> >> https://issues.apache.org/jira/browse/AVRO-1035 > >> >> > >> >> It is possible to append to an existing > Avro file: > >> >> > >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> > >> >> Should we close that issue as "fixed"? > >> >> > >> >> Doug > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael > Malak > >> <michaelma...@yahoo.com> > >> >> wrote: > >> >> > Was a JIRA ticket ever created > regarding > >> appending to > >> >> an existing Avro file on HDFS? > >> >> > > >> >> > What is the status of such a > capability, a > >> year out > >> >> from when the issue below was raised? > >> >> > > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, > >> "Vyacheslav > >> >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> Thanks for your reply, I > suspected this. > >> >> >> > >> >> >> I will create a JIRA ticket. > >> >> >> > >> >> >> Vyacheslav > >> >> >> > >> >> >> On Feb 21, 2012, at 6:02 PM, > Scott Carey > >> wrote: > >> >> >> > >> >> >>> > >> >> >>> On 2/21/12 7:29 AM, > "Vyacheslav > >> Zholudev" > >> >> <vyacheslav.zholu...@gmail.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Yep, I saw that method as > well as > >> the > >> >> stackoverflow post. However, I'm > >> >> >>>> interested how to append > to a file > >> on the > >> >> arbitrary file system, not > >> >> >>>> only on the local one. > >> >> >>>> > >> >> >>>> I want to get an > OutputStream > >> based on the > >> >> Path and the FileSystem > >> >> >>>> implementation and then > pass it > >> for > >> >> appending to avro methods. > >> >> >>>> > >> >> >>>> Is that possible? > >> >> >>> > >> >> >>> It is not possible without > modifying > >> >> DataFileWriter. Please open a JIRA > >> >> >>> ticket. > >> >> >>> > >> >> >>> It could not simply append to > an > >> OutputStream, > >> >> since it must either: > >> >> >>> * Seek to the start to > validate the > >> schemas > >> >> match and find the sync > >> >> >>> marker, or > >> >> >>> * Trust that the schemas > match and > >> find the > >> >> sync marker from the last > >> >> >>> block > >> >> >>> > >> >> >>> DataFileWriter cannot refer > to Hadoop > >> classes > >> >> such as FileSystem, but we > >> >> >>> could add something to the > mapred > >> module that > >> >> takes a Path and > >> >> >>> FileSystem and returns > something that > >> >> implemements an interface that > >> >> >>> DataFileWriter can append > to. > >> This would > >> >> be something that is both a > >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >> >>> and an OutputStream, or has > both an > >> InputStream > >> >> from the start of the > >> >> >>> existing file and an > OutputStream at > >> the end. > >> >> >>> > >> >> >>>> Thanks, > >> >> >>>> Vyacheslav > >> >> >>>> > >> >> >>>> On Feb 21, 2012, at 5:29 > AM, Harsh > >> J > >> >> wrote: > >> >> >>>> > >> >> >>>>> Hi, > >> >> >>>>> > >> >> >>>>> Use the appendTo > feature of > >> the > >> >> DataFileWriter. See > >> >> >>>>> > >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >> >>>>> > >> >> >>>>> For a quick setup > example, > >> read also: > >> >> >>>>> > >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 > at 3:15 > >> AM, > >> >> Vyacheslav Zholudev > >> >> >>>>> <vyacheslav.zholu...@gmail.com> > >> >> wrote: > >> >> >>>>>> Hi, > >> >> >>>>>> > >> >> >>>>>> is it possible to > append > >> to an > >> >> already existing avro file when it was > >> >> >>>>>> written and > closed > >> before? > >> >> >>>>>> > >> >> >>>>>> If I use > >> >> >>>>>> outputStream = > >> >> fs.append(avroFilePath); > >> >> >>>>>> > >> >> >>>>>> then later on I > get: > >> >> java.io.IOException: Invalid sync! > >> >> >>>>>> > >> >> >>>>>> Probably because > the > >> schema is > >> >> written twice and some other issues. > >> >> >>>>>> > >> >> >>>>>> If I use > outputStream = > >> >> fs.create(avroFilePath); then the avro > file > >> >> >>>>>> gets > >> >> >>>>>> overwritten. > >> >> >>>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Vyacheslav > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Harsh J > >> >> >>>>> Customer Ops. > Engineer > >> >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> >> > > >> >> > >> > > > > -- > Harsh J >