RE: How to update a file which is in HDFS

John Lilley Thu, 04 Jul 2013 14:36:38 -0700

Manickam,

HDFS supports append; it is the command-line client that does not.
You can write a Java application that opens an HDFS-based file for append, and 
use that instead of the hadoop command line.
However, this doesn't completely answer your original question: "How do I move 
only the delta part"?  This can be more complex than simply doing an append.  
Have records in the original file changed in addition to new records becoming 
available?  If that is the case, you will need to completely rewrite the file, 
as there is no overwriting of existing file sections, even directly using HDFS. 
 There are clever strategies for working around this, like splitting the file 
into multiple parts on HDFS so that the overwrite can proceed in parallel on 
the cluster; however, that may be more work that you are looking for.  Even if 
the delta is limited to new records, the problem may not be trivial.  How do 
you know which records are new?  Are all of the new records a the end of the 
file?  Or can they be anywhere in the file?  If the latter, you will need more 
complex logic.

John

From: Mohammad Tariq [mailto:donta...@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in HDFS

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P 
<manicka...@outlook.com<mailto:manicka...@outlook.com>> wrote:
Hi,

I have moved my input file into the HDFS location in the cluster setup.
Now i got a new set of file which has some new records along with the old one.
I want to move the delta part alone into HDFS because it will take more time to 
move the file from my local to HDFS location.
Is it possible or do i need to move the entire file into HDFS again?

Thanks,
Manickam P

RE: How to update a file which is in HDFS

Reply via email to