Hello,
I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
m/r job over a bz2 compressed file (18G). Since splitting support
wasn't added until 0.21.0, a single mapper is getting allocated and will
take far too long to complete. Is there a way that I can decompress the
f
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
BZ