Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-24 Thread pig
Hello, We just recently switched to using lzo compressed file input for our hadoop cluster using Kevin Weil's lzo library. The files are pretty uniform in size at around 200MB compressed. Our block size is 256MB. Decompressed the average LZO input file is around 1.0GB. I noticed lots of our jo

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread pig
ance for miscellaneous applications. > > Your other option of running a map per 32MB or 64MB of input should give > you better performance if your map task execution time is significant (i.e., > much larger than a few seconds) compared to the overhead of launching map > tasks an