Re: Saving compressed sequence files

Ameet Kini Wed, 28 Aug 2013 06:58:02 -0700

Folks,

Still stuck on this, so would greatly appreciate any pointers as to how to
force Spark to recognize the mapred.output.compression.type hadoop
parameter.


Thanks,
Ameet


On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <[email protected]> wrote:

>
> I'm trying to use saveAsSequenceFile to output compressed sequenced files
> where the "value" in each key,value pair is compressed. In Hadoop, I would
> set this job configuration parameter:
> "mapred.output.compression.type=RECORD" for record level compression.
> Previous posts have suggested that this is possible by simply setting this
> parameter in the core-site.xml. I tried doing just that, and the sequence
> file doesn't seem to be compressed.
>
> I've also tried doing this by setting
> spark.hadoop.mapred.output.compression.type as a system parameter just
> before initializing the spark context:
> System.setProperty("spark.hadoop.mapred.output.compression.type", "RECORD")
>
> In both cases, I can see that the resulting configuration as per
> SparkContext.hadoopConfiguration has the property set to RECORD, but the
> resulting sequence file still has its value uncompressed.
>
> At first, I thought that this is because io.compression.codecs was set to
> null, so I set io.compression.codecs to the long list of codecs that is its
> normal default value in a Hadoop environment, but still to no avail. Am I
> missing a crucial step?
>
> Thanks,
> Ameet
>

Re: Saving compressed sequence files

Reply via email to