Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?
What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vyacheslav.zholu...@gmail.com> wrote: > Thanks for your reply, I suspected this. > > I will create a JIRA ticket. > > Vyacheslav > > On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: > >> >> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vyacheslav.zholu...@gmail.com> >> wrote: >> >>> Yep, I saw that method as well as the stackoverflow post. However, I'm >>> interested how to append to a file on the arbitrary file system, not >>> only on the local one. >>> >>> I want to get an OutputStream based on the Path and the FileSystem >>> implementation and then pass it for appending to avro methods. >>> >>> Is that possible? >> >> It is not possible without modifying DataFileWriter. Please open a JIRA >> ticket. >> >> It could not simply append to an OutputStream, since it must either: >> * Seek to the start to validate the schemas match and find the sync >> marker, or >> * Trust that the schemas match and find the sync marker from the last >> block >> >> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we >> could add something to the mapred module that takes a Path and >> FileSystem and returns something that implemements an interface that >> DataFileWriter can append to. This would be something that is both a >> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> and an OutputStream, or has both an InputStream from the start of the >> existing file and an OutputStream at the end. >> >>> Thanks, >>> Vyacheslav >>> >>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote: >>> >>>> Hi, >>>> >>>> Use the appendTo feature of the DataFileWriter. See >>>> >>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>> >>>> For a quick setup example, read also: >>>> >>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>> >>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev >>>> <vyacheslav.zholu...@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> is it possible to append to an already existing avro file when it was >>>>> written and closed before? >>>>> >>>>> If I use >>>>> outputStream = fs.append(avroFilePath); >>>>> >>>>> then later on I get: java.io.IOException: Invalid sync! >>>>> >>>>> Probably because the schema is written twice and some other issues. >>>>> >>>>> If I use outputStream = fs.create(avroFilePath); then the avro file >>>>> gets >>>>> overwritten. >>>>> >>>>> Thanks, >>>>> Vyacheslav >>>> >>>> -- >>>> Harsh J >>>> Customer Ops. Engineer >>>> Cloudera | http://tiny.cloudera.com/about