Re: How to update a file which is in HDFS

Robin East Fri, 05 Jul 2013 00:55:48 -0700

Ok just read the JIRA in detail (pays to read these things before posting). It 
says:


Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need 
append. If you enabled dfs.support.append for HBase, you're OK, as durable sync 
(why HBase required dfs.support.append) is now enabled by default. If you 
really need the previous functionality, to turn on the append functionality set 
the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set 
dfs.support.broken.append to true. So append appears to be available in 1.x but 
it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <robin.e...@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin 
> On 5 Jul 2013, at 01:50, Mohammad Tariq <donta...@gmail.com> wrote:
> 
>> The current stable release doesn't support append, not even through the API. 
>> If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>> 
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>> 
>> 
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <john.lil...@redpoint.net> wrote:
>> Manickam,
>> 
>>  
>> 
>> HDFS supports append; it is the command-line client that does not. 
>> 
>> You can write a Java application that opens an HDFS-based file for append, 
>> and use that instead of the hadoop command line.
>> 
>> However, this doesn’t completely answer your original question: “How do I 
>> move only the delta part”?  This can be more complex than simply doing an 
>> append.  Have records in the original file changed in addition to new 
>> records becoming available?  If that is the case, you will need to 
>> completely rewrite the file, as there is no overwriting of existing file 
>> sections, even directly using HDFS.  There are clever strategies for working 
>> around this, like splitting the file into multiple parts on HDFS so that the 
>> overwrite can proceed in parallel on the cluster; however, that may be more 
>> work that you are looking for.  Even if the delta is limited to new records, 
>> the problem may not be trivial.  How do you know which records are new?  Are 
>> all of the new records a the end of the file?  Or can they be anywhere in 
>> the file?  If the latter, you will need more complex logic.
>> 
>>  
>> 
>> John
>> 
>>  
>> 
>>  
>> 
>> From: Mohammad Tariq [mailto:donta...@gmail.com] 
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>> 
>>  
>> 
>> Hello Manickam,
>> 
>>  
>> 
>>         Append is currently not possible.
>> 
>> 
>> 
>> Warm Regards,
>> 
>> Tariq
>> 
>> cloudfront.blogspot.com
>> 
>>  
>> 
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manicka...@outlook.com> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have moved my input file into the HDFS location in the cluster setup. 
>> 
>> Now i got a new set of file which has some new records along with the old 
>> one. 
>> 
>> I want to move the delta part alone into HDFS because it will take more time 
>> to move the file from my local to HDFS location. 
>> 
>> Is it possible or do i need to move the entire file into HDFS again? 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Manickam P
>> 
>>  
>> 
>> 
>

Re: How to update a file which is in HDFS

Reply via email to