subject:"SequenceFile split question"

Re: SequenceFile split question

2012-03-15 Thread Bejoy Ks

Hi Mohit If you are using a stand alone client application to do the same definitely there is just one instance of the same running and you'd be writing the sequence file to one hdfs block at a time. Once it reaches hdfs block size the writing continues to next block, in the mean time the

Re: SequenceFile split question

2012-03-15 Thread Mohit Anchlia

Thanks! that helps. I am reading small xml files from external file system and then writing to the SequenceFile. I made it stand alone client thinking that mapreduce may not be the best way to do this type of writing. My understanding was that map reduce is best suited for processing data within

Re: SequenceFile split question

2012-03-15 Thread Bejoy Ks

Hi Mohit You are right. If your smaller XML files are in hdfs then MR would be the best approach to combine it to a sequence file. It'd do the job in parallel. Regards Bejoy.K.S On Thu, Mar 15, 2012 at 8:17 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Thanks! that helps. I am reading

SequenceFile split question

2012-03-14 Thread Mohit Anchlia

I have a client program that creates sequencefile, which essentially merges small files into a big file. I was wondering how is sequence file splitting the data accross nodes. When I start the sequence file is empty. Does it get split when it reaches the dfs.block size? If so then does it mean