Re: Is a Block compressed (GZIP) SequenceFile splittable in MR operation?

2011-01-31 Thread Niels Basjes
Hi, 2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com: GZIP is not splittable. Correct, gzip is a stream compression system which effectively means you can only start at the beginning of the data with decompressing. Does that mean a GZIP block compressed sequencefile can't take advantage of

Re: Is a Block compressed (GZIP) SequenceFile splittable in MR operation?

2011-01-31 Thread Harsh J
On Mon, Jan 31, 2011 at 1:56 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: How to control the size of block to be compressed in SequenceFile? Specified when creating a SequenceFile.Writer object. See the various SequenceFile.createWriter() -- Harsh J www.harshj.com

Re: Is a Block compressed (GZIP) SequenceFile splittable in MR operation?

2011-01-31 Thread Sean Bigdatafun
On Mon, Jan 31, 2011 at 12:36 AM, Niels Basjes ni...@basjes.nl wrote: Hi, 2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com: GZIP is not splittable. Correct, gzip is a stream compression system which effectively means you can only start at the beginning of the data with decompressing.

Re: Is a Block compressed (GZIP) SequenceFile splittable in MR operation?

2011-01-31 Thread Harsh J
Hello, On Mon, Jan 31, 2011 at 10:41 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: On Mon, Jan 31, 2011 at 12:36 AM, Niels Basjes ni...@basjes.nl wrote: Hi, 2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com: GZIP is not splittable. Correct, gzip is a stream compression system