I assume by non-trivial you meant the extra Seekable stuff I needed to
wrap around the DFS output streams to let Avro take it as append-able?
I don't think its possible for Avro to carry it since Avro (core) does
not reverse-depend on Hadoop. Should we document it somewhere though?
Do you have any ideas on the best place to do that?

On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <michaelma...@yahoo.com> wrote:
> Thanks so much for the code -- it works great!
>
> Since it is a non-trivial amount of code required to achieve append, I 
> suggest attaching that code to AVRO-1035, in the hopes that someone will come 
> up with an interface that requires just one line of user code to achieve 
> append.
>
> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:
>
>> From: Harsh J <ha...@cloudera.com>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Wednesday, February 6, 2013, 11:17 AM
>> Hey Michael,
>>
>> It does implement the regular Java OutputStream interface,
>> as seen in
>> the API: 
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>
>> Here's a sample program that works on Hadoop 2.x in my
>> tests:
>> https://gist.github.com/QwertyManiac/4724582
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutt...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <michaelma...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cutt...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutt...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <michaelma...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cutt...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutt...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <michaelma...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cutt...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <michaelma...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vyacheslav.zholu...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Reply via email to