I *completely* missed that, although I've worked with it in past, thanks Doug!
I updated my example: https://gist.github.com/QwertyManiac/4724582. On Thu, Feb 7, 2013 at 10:21 PM, Doug Cutting <cutt...@apache.org> wrote: > The avro-mapred module includes a Seekable implementation that works > with HDFS called FsInput: > > http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html > > With this, your example can be made considerably smaller. > > Doug > > > > On Thu, Feb 7, 2013 at 8:28 AM, Harsh J <ha...@cloudera.com> wrote: >> I assume by non-trivial you meant the extra Seekable stuff I needed to >> wrap around the DFS output streams to let Avro take it as append-able? >> I don't think its possible for Avro to carry it since Avro (core) does >> not reverse-depend on Hadoop. Should we document it somewhere though? >> Do you have any ideas on the best place to do that? >> >> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <michaelma...@yahoo.com> wrote: >>> Thanks so much for the code -- it works great! >>> >>> Since it is a non-trivial amount of code required to achieve append, I >>> suggest attaching that code to AVRO-1035, in the hopes that someone will >>> come up with an interface that requires just one line of user code to >>> achieve append. >>> >>> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote: >>> >>>> From: Harsh J <ha...@cloudera.com> >>>> Subject: Re: Is it possible to append to an already existing avro file >>>> To: user@avro.apache.org >>>> Date: Wednesday, February 6, 2013, 11:17 AM >>>> Hey Michael, >>>> >>>> It does implement the regular Java OutputStream interface, >>>> as seen in >>>> the API: >>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. >>>> >>>> Here's a sample program that works on Hadoop 2.x in my >>>> tests: >>>> https://gist.github.com/QwertyManiac/4724582 >>>> >>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >>>> wrote: >>>> > I don't believe a Hadoop FileSystem is a Java >>>> OutputStream? >>>> > >>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> wrote: >>>> > >>>> >> From: Doug Cutting <cutt...@apache.org> >>>> >> Subject: Re: Is it possible to append to an already >>>> existing avro file >>>> >> To: user@avro.apache.org >>>> >> Date: Tuesday, February 5, 2013, 5:27 PM >>>> >> It will work on an OutputStream that >>>> >> supports append. >>>> >> >>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >>>> >> java.io.OutputStream) >>>> >> >>>> >> So it depends on how well HDFS implements >>>> >> FileSystem#append(), not on >>>> >> any changes in Avro. >>>> >> >>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >>>> >> >>>> >> I have no recent personal experience with append >>>> in >>>> >> HDFS. Does anyone >>>> >> else here? >>>> >> >>>> >> Doug >>>> >> >>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >>>> <michaelma...@yahoo.com> >>>> >> wrote: >>>> >> > My understanding is that will append to a file >>>> on the >>>> >> local filesystem, but not to a file on HDFS. >>>> >> > >>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> >> wrote: >>>> >> > >>>> >> >> From: Doug Cutting <cutt...@apache.org> >>>> >> >> Subject: Re: Is it possible to append to >>>> an already >>>> >> existing avro file >>>> >> >> To: user@avro.apache.org >>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >>>> >> >> The Jira is: >>>> >> >> >>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >>>> >> >> >>>> >> >> It is possible to append to an existing >>>> Avro file: >>>> >> >> >>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>> >> >> Should we close that issue as "fixed"? >>>> >> >> >>>> >> >> Doug >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >>>> wrote: >>>> > I don't believe a Hadoop FileSystem is a Java >>>> OutputStream? >>>> > >>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> wrote: >>>> > >>>> >> From: Doug Cutting <cutt...@apache.org> >>>> >> Subject: Re: Is it possible to append to an already >>>> existing avro file >>>> >> To: user@avro.apache.org >>>> >> Date: Tuesday, February 5, 2013, 5:27 PM >>>> >> It will work on an OutputStream that >>>> >> supports append. >>>> >> >>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >>>> >> java.io.OutputStream) >>>> >> >>>> >> So it depends on how well HDFS implements >>>> >> FileSystem#append(), not on >>>> >> any changes in Avro. >>>> >> >>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >>>> >> >>>> >> I have no recent personal experience with append >>>> in >>>> >> HDFS. Does anyone >>>> >> else here? >>>> >> >>>> >> Doug >>>> >> >>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >>>> <michaelma...@yahoo.com> >>>> >> wrote: >>>> >> > My understanding is that will append to a file >>>> on the >>>> >> local filesystem, but not to a file on HDFS. >>>> >> > >>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> >> wrote: >>>> >> > >>>> >> >> From: Doug Cutting <cutt...@apache.org> >>>> >> >> Subject: Re: Is it possible to append to >>>> an already >>>> >> existing avro file >>>> >> >> To: user@avro.apache.org >>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >>>> >> >> The Jira is: >>>> >> >> >>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >>>> >> >> >>>> >> >> It is possible to append to an existing >>>> Avro file: >>>> >> >> >>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>> >> >> Should we close that issue as "fixed"? >>>> >> >> >>>> >> >> Doug >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >>>> wrote: >>>> > I don't believe a Hadoop FileSystem is a Java >>>> OutputStream? >>>> > >>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> wrote: >>>> > >>>> >> From: Doug Cutting <cutt...@apache.org> >>>> >> Subject: Re: Is it possible to append to an already >>>> existing avro file >>>> >> To: user@avro.apache.org >>>> >> Date: Tuesday, February 5, 2013, 5:27 PM >>>> >> It will work on an OutputStream that >>>> >> supports append. >>>> >> >>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >>>> >> java.io.OutputStream) >>>> >> >>>> >> So it depends on how well HDFS implements >>>> >> FileSystem#append(), not on >>>> >> any changes in Avro. >>>> >> >>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >>>> >> >>>> >> I have no recent personal experience with append >>>> in >>>> >> HDFS. Does anyone >>>> >> else here? >>>> >> >>>> >> Doug >>>> >> >>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >>>> <michaelma...@yahoo.com> >>>> >> wrote: >>>> >> > My understanding is that will append to a file >>>> on the >>>> >> local filesystem, but not to a file on HDFS. >>>> >> > >>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >>>> >> wrote: >>>> >> > >>>> >> >> From: Doug Cutting <cutt...@apache.org> >>>> >> >> Subject: Re: Is it possible to append to >>>> an already >>>> >> existing avro file >>>> >> >> To: user@avro.apache.org >>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >>>> >> >> The Jira is: >>>> >> >> >>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >>>> >> >> >>>> >> >> It is possible to append to an existing >>>> Avro file: >>>> >> >> >>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>> >> >> Should we close that issue as "fixed"? >>>> >> >> >>>> >> >> Doug >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >>>> Malak >>>> >> <michaelma...@yahoo.com> >>>> >> >> wrote: >>>> >> >> > Was a JIRA ticket ever created >>>> regarding >>>> >> appending to >>>> >> >> an existing Avro file on HDFS? >>>> >> >> > >>>> >> >> > What is the status of such a >>>> capability, a >>>> >> year out >>>> >> >> from when the issue below was raised? >>>> >> >> > >>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >>>> >> "Vyacheslav >>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> > >>>> >> >> >> Thanks for your reply, I >>>> suspected this. >>>> >> >> >> >>>> >> >> >> I will create a JIRA ticket. >>>> >> >> >> >>>> >> >> >> Vyacheslav >>>> >> >> >> >>>> >> >> >> On Feb 21, 2012, at 6:02 PM, >>>> Scott Carey >>>> >> wrote: >>>> >> >> >> >>>> >> >> >>> >>>> >> >> >>> On 2/21/12 7:29 AM, >>>> "Vyacheslav >>>> >> Zholudev" >>>> >> >> <vyacheslav.zholu...@gmail.com> >>>> >> >> >>> wrote: >>>> >> >> >>> >>>> >> >> >>>> Yep, I saw that method as >>>> well as >>>> >> the >>>> >> >> stackoverflow post. However, I'm >>>> >> >> >>>> interested how to append >>>> to a file >>>> >> on the >>>> >> >> arbitrary file system, not >>>> >> >> >>>> only on the local one. >>>> >> >> >>>> >>>> >> >> >>>> I want to get an >>>> OutputStream >>>> >> based on the >>>> >> >> Path and the FileSystem >>>> >> >> >>>> implementation and then >>>> pass it >>>> >> for >>>> >> >> appending to avro methods. >>>> >> >> >>>> >>>> >> >> >>>> Is that possible? >>>> >> >> >>> >>>> >> >> >>> It is not possible without >>>> modifying >>>> >> >> DataFileWriter. Please open a JIRA >>>> >> >> >>> ticket. >>>> >> >> >>> >>>> >> >> >>> It could not simply append to >>>> an >>>> >> OutputStream, >>>> >> >> since it must either: >>>> >> >> >>> * Seek to the start to >>>> validate the >>>> >> schemas >>>> >> >> match and find the sync >>>> >> >> >>> marker, or >>>> >> >> >>> * Trust that the schemas >>>> match and >>>> >> find the >>>> >> >> sync marker from the last >>>> >> >> >>> block >>>> >> >> >>> >>>> >> >> >>> DataFileWriter cannot refer >>>> to Hadoop >>>> >> classes >>>> >> >> such as FileSystem, but we >>>> >> >> >>> could add something to the >>>> mapred >>>> >> module that >>>> >> >> takes a Path and >>>> >> >> >>> FileSystem and returns >>>> something that >>>> >> >> implemements an interface that >>>> >> >> >>> DataFileWriter can append >>>> to. >>>> >> This would >>>> >> >> be something that is both a >>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>>> >> >> >>> and an OutputStream, or has >>>> both an >>>> >> InputStream >>>> >> >> from the start of the >>>> >> >> >>> existing file and an >>>> OutputStream at >>>> >> the end. >>>> >> >> >>> >>>> >> >> >>>> Thanks, >>>> >> >> >>>> Vyacheslav >>>> >> >> >>>> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 >>>> AM, Harsh >>>> >> J >>>> >> >> wrote: >>>> >> >> >>>> >>>> >> >> >>>>> Hi, >>>> >> >> >>>>> >>>> >> >> >>>>> Use the appendTo >>>> feature of >>>> >> the >>>> >> >> DataFileWriter. See >>>> >> >> >>>>> >>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >> >> >>>>> >>>> >> >> >>>>> For a quick setup >>>> example, >>>> >> read also: >>>> >> >> >>>>> >>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >> >> >>>>> >>>> >> >> >>>>> On Tue, Feb 21, 2012 >>>> at 3:15 >>>> >> AM, >>>> >> >> Vyacheslav Zholudev >>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >>>> >> >> wrote: >>>> >> >> >>>>>> Hi, >>>> >> >> >>>>>> >>>> >> >> >>>>>> is it possible to >>>> append >>>> >> to an >>>> >> >> already existing avro file when it was >>>> >> >> >>>>>> written and >>>> closed >>>> >> before? >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> >> >> >>>>>> outputStream = >>>> >> >> fs.append(avroFilePath); >>>> >> >> >>>>>> >>>> >> >> >>>>>> then later on I >>>> get: >>>> >> >> java.io.IOException: Invalid sync! >>>> >> >> >>>>>> >>>> >> >> >>>>>> Probably because >>>> the >>>> >> schema is >>>> >> >> written twice and some other issues. >>>> >> >> >>>>>> >>>> >> >> >>>>>> If I use >>>> outputStream = >>>> >> >> fs.create(avroFilePath); then the avro >>>> file >>>> >> >> >>>>>> gets >>>> >> >> >>>>>> overwritten. >>>> >> >> >>>>>> >>>> >> >> >>>>>> Thanks, >>>> >> >> >>>>>> Vyacheslav >>>> >> >> >>>>> >>>> >> >> >>>>> -- >>>> >> >> >>>>> Harsh J >>>> >> >> >>>>> Customer Ops. >>>> Engineer >>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >>>> >> >> > >>>> >> >> >>>> >> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >> >> >> >> -- >> Harsh J -- Harsh J