Hi there,
The current documentation says:
> By default, data is not compressed. You can compress your data by using the
> deflate (gzip) algorithm with the -z or --compress argument, or specify any
> Hadoop compression codec using the --compression-codec argument. This applies
> to both SequenceFiles or text files.
>
But I think this is a bit misleading.
Currently if output compression is enabled in a cluster, then the Sqooped data
is alway compressed, regardless of the setting of this flag.
It seems better to actually make compression controllable via --compress, which
means changing ImportJobBase.configureOutputFormat()
if (options.shouldUseCompression()) {
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
SequenceFileOutputFormat.setOutputCompressionType(job,
CompressionType.BLOCK);
}
// new stuff
else {
FileOutputFormat.setCompressOutput(job, false);
}
Thoughts?
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr