bz2 decompress in place

2013-08-21 Thread Zac Shepherd
Hello, I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a m/r job over a bz2 compressed file (18G). Since splitting support wasn't added until 0.21.0, a single mapper is getting allocated and will take far too long to complete. Is there a way that I can decompress the f

Re: bz2 decompress in place

2013-08-22 Thread Zac Shepherd
Just because I always appreciate it when someone posts the answer to their own question: We have some java that does BZip2Codec bz2 = new BZip2Codec(); CompressionOutputStream cout = bz2.createOutputStream(out); for compression. We just wrote another version that does BZ