Hi Mohit
If you are using a stand alone client application to do the same
definitely there is just one instance of the same running and you'd be
writing the sequence file to one hdfs block at a time. Once it reaches hdfs
block size the writing continues to next block, in the mean time the
Thanks! that helps. I am reading small xml files from external file system
and then writing to the SequenceFile. I made it stand alone client thinking
that mapreduce may not be the best way to do this type of writing. My
understanding was that map reduce is best suited for processing data within
Hi Mohit
You are right. If your smaller XML files are in hdfs then MR would be
the best approach to combine it to a sequence file. It'd do the job
in parallel.
Regards
Bejoy.K.S
On Thu, Mar 15, 2012 at 8:17 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Thanks! that helps. I am reading
I have a client program that creates sequencefile, which essentially merges
small files into a big file. I was wondering how is sequence file splitting
the data accross nodes. When I start the sequence file is empty. Does it
get split when it reaches the dfs.block size? If so then does it mean