I assume by non-trivial you meant the extra Seekable stuff I needed to wrap around the DFS output streams to let Avro take it as append-able? I don't think its possible for Avro to carry it since Avro (core) does not reverse-depend on Hadoop. Should we document it somewhere though? Do you have any ideas on the best place to do that?
On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <michaelma...@yahoo.com> wrote: > Thanks so much for the code -- it works great! > > Since it is a non-trivial amount of code required to achieve append, I > suggest attaching that code to AVRO-1035, in the hopes that someone will come > up with an interface that requires just one line of user code to achieve > append. > > --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote: > >> From: Harsh J <ha...@cloudera.com> >> Subject: Re: Is it possible to append to an already existing avro file >> To: user@avro.apache.org >> Date: Wednesday, February 6, 2013, 11:17 AM >> Hey Michael, >> >> It does implement the regular Java OutputStream interface, >> as seen in >> the API: >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. >> >> Here's a sample program that works on Hadoop 2.x in my >> tests: >> https://gist.github.com/QwertyManiac/4724582 >> >> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >> wrote: >> > I don't believe a Hadoop FileSystem is a Java >> OutputStream? >> > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> wrote: >> > >> >> From: Doug Cutting <cutt...@apache.org> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: user@avro.apache.org >> >> Date: Tuesday, February 5, 2013, 5:27 PM >> >> It will work on an OutputStream that >> >> supports append. >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> >> java.io.OutputStream) >> >> >> >> So it depends on how well HDFS implements >> >> FileSystem#append(), not on >> >> any changes in Avro. >> >> >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> >> >> I have no recent personal experience with append >> in >> >> HDFS. Does anyone >> >> else here? >> >> >> >> Doug >> >> >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >> <michaelma...@yahoo.com> >> >> wrote: >> >> > My understanding is that will append to a file >> on the >> >> local filesystem, but not to a file on HDFS. >> >> > >> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> >> wrote: >> >> > >> >> >> From: Doug Cutting <cutt...@apache.org> >> >> >> Subject: Re: Is it possible to append to >> an already >> >> existing avro file >> >> >> To: user@avro.apache.org >> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> >> The Jira is: >> >> >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> >> >> It is possible to append to an existing >> Avro file: >> >> >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> >> >> Doug >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> >> >> >> -- >> Harsh J >> >> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >> wrote: >> > I don't believe a Hadoop FileSystem is a Java >> OutputStream? >> > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> wrote: >> > >> >> From: Doug Cutting <cutt...@apache.org> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: user@avro.apache.org >> >> Date: Tuesday, February 5, 2013, 5:27 PM >> >> It will work on an OutputStream that >> >> supports append. >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> >> java.io.OutputStream) >> >> >> >> So it depends on how well HDFS implements >> >> FileSystem#append(), not on >> >> any changes in Avro. >> >> >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> >> >> I have no recent personal experience with append >> in >> >> HDFS. Does anyone >> >> else here? >> >> >> >> Doug >> >> >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >> <michaelma...@yahoo.com> >> >> wrote: >> >> > My understanding is that will append to a file >> on the >> >> local filesystem, but not to a file on HDFS. >> >> > >> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> >> wrote: >> >> > >> >> >> From: Doug Cutting <cutt...@apache.org> >> >> >> Subject: Re: Is it possible to append to >> an already >> >> existing avro file >> >> >> To: user@avro.apache.org >> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> >> The Jira is: >> >> >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> >> >> It is possible to append to an existing >> Avro file: >> >> >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> >> >> Doug >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> >> >> >> -- >> Harsh J >> >> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com> >> wrote: >> > I don't believe a Hadoop FileSystem is a Java >> OutputStream? >> > >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> wrote: >> > >> >> From: Doug Cutting <cutt...@apache.org> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: user@avro.apache.org >> >> Date: Tuesday, February 5, 2013, 5:27 PM >> >> It will work on an OutputStream that >> >> supports append. >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> >> java.io.OutputStream) >> >> >> >> So it depends on how well HDFS implements >> >> FileSystem#append(), not on >> >> any changes in Avro. >> >> >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> >> >> I have no recent personal experience with append >> in >> >> HDFS. Does anyone >> >> else here? >> >> >> >> Doug >> >> >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak >> <michaelma...@yahoo.com> >> >> wrote: >> >> > My understanding is that will append to a file >> on the >> >> local filesystem, but not to a file on HDFS. >> >> > >> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> >> >> wrote: >> >> > >> >> >> From: Doug Cutting <cutt...@apache.org> >> >> >> Subject: Re: Is it possible to append to >> an already >> >> existing avro file >> >> >> To: user@avro.apache.org >> >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> >> The Jira is: >> >> >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> >> >> It is possible to append to an existing >> Avro file: >> >> >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> >> >> Doug >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael >> Malak >> >> <michaelma...@yahoo.com> >> >> >> wrote: >> >> >> > Was a JIRA ticket ever created >> regarding >> >> appending to >> >> >> an existing Avro file on HDFS? >> >> >> > >> >> >> > What is the status of such a >> capability, a >> >> year out >> >> >> from when the issue below was raised? >> >> >> > >> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> >> "Vyacheslav >> >> >> Zholudev" <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Thanks for your reply, I >> suspected this. >> >> >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, >> Scott Carey >> >> wrote: >> >> >> >> >> >> >> >>> >> >> >> >>> On 2/21/12 7:29 AM, >> "Vyacheslav >> >> Zholudev" >> >> >> <vyacheslav.zholu...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Yep, I saw that method as >> well as >> >> the >> >> >> stackoverflow post. However, I'm >> >> >> >>>> interested how to append >> to a file >> >> on the >> >> >> arbitrary file system, not >> >> >> >>>> only on the local one. >> >> >> >>>> >> >> >> >>>> I want to get an >> OutputStream >> >> based on the >> >> >> Path and the FileSystem >> >> >> >>>> implementation and then >> pass it >> >> for >> >> >> appending to avro methods. >> >> >> >>>> >> >> >> >>>> Is that possible? >> >> >> >>> >> >> >> >>> It is not possible without >> modifying >> >> >> DataFileWriter. Please open a JIRA >> >> >> >>> ticket. >> >> >> >>> >> >> >> >>> It could not simply append to >> an >> >> OutputStream, >> >> >> since it must either: >> >> >> >>> * Seek to the start to >> validate the >> >> schemas >> >> >> match and find the sync >> >> >> >>> marker, or >> >> >> >>> * Trust that the schemas >> match and >> >> find the >> >> >> sync marker from the last >> >> >> >>> block >> >> >> >>> >> >> >> >>> DataFileWriter cannot refer >> to Hadoop >> >> classes >> >> >> such as FileSystem, but we >> >> >> >>> could add something to the >> mapred >> >> module that >> >> >> takes a Path and >> >> >> >>> FileSystem and returns >> something that >> >> >> implemements an interface that >> >> >> >>> DataFileWriter can append >> to. >> >> This would >> >> >> be something that is both a >> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >> >>> and an OutputStream, or has >> both an >> >> InputStream >> >> >> from the start of the >> >> >> >>> existing file and an >> OutputStream at >> >> the end. >> >> >> >>> >> >> >> >>>> Thanks, >> >> >> >>>> Vyacheslav >> >> >> >>>> >> >> >> >>>> On Feb 21, 2012, at 5:29 >> AM, Harsh >> >> J >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Hi, >> >> >> >>>>> >> >> >> >>>>> Use the appendTo >> feature of >> >> the >> >> >> DataFileWriter. See >> >> >> >>>>> >> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >>>>> >> >> >> >>>>> For a quick setup >> example, >> >> read also: >> >> >> >>>>> >> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 >> at 3:15 >> >> AM, >> >> >> Vyacheslav Zholudev >> >> >> >>>>> <vyacheslav.zholu...@gmail.com> >> >> >> wrote: >> >> >> >>>>>> Hi, >> >> >> >>>>>> >> >> >> >>>>>> is it possible to >> append >> >> to an >> >> >> already existing avro file when it was >> >> >> >>>>>> written and >> closed >> >> before? >> >> >> >>>>>> >> >> >> >>>>>> If I use >> >> >> >>>>>> outputStream = >> >> >> fs.append(avroFilePath); >> >> >> >>>>>> >> >> >> >>>>>> then later on I >> get: >> >> >> java.io.IOException: Invalid sync! >> >> >> >>>>>> >> >> >> >>>>>> Probably because >> the >> >> schema is >> >> >> written twice and some other issues. >> >> >> >>>>>> >> >> >> >>>>>> If I use >> outputStream = >> >> >> fs.create(avroFilePath); then the avro >> file >> >> >> >>>>>> gets >> >> >> >>>>>> overwritten. >> >> >> >>>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Vyacheslav >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Harsh J >> >> >> >>>>> Customer Ops. >> Engineer >> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> >> > >> >> >> >> >> >> >> >> >> -- >> Harsh J >> -- Harsh J