Re: Saving compressed sequence files

Reynold Xin Wed, 28 Aug 2013 21:48:58 -0700

I don't think it's a system property.

There is support for adding compression to the save function in the latest
0.8 code:
https://github.com/mesos/spark/blob/master/core/src/main/scala/spark/PairRDDFunctions.scala#L609


You can take a look at how that is done.


--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Wed, Aug 28, 2013 at 6:56 AM, Ameet Kini <[email protected]> wrote:

> Folks,
>
> Still stuck on this, so would greatly appreciate any pointers as to how to
> force Spark to recognize the mapred.output.compression.type hadoop
> parameter.
>
> Thanks,
> Ameet
>
>
> On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <[email protected]> wrote:
>
>>
>> I'm trying to use saveAsSequenceFile to output compressed sequenced files
>> where the "value" in each key,value pair is compressed. In Hadoop, I would
>> set this job configuration parameter:
>> "mapred.output.compression.type=RECORD" for record level compression.
>> Previous posts have suggested that this is possible by simply setting this
>> parameter in the core-site.xml. I tried doing just that, and the sequence
>> file doesn't seem to be compressed.
>>
>> I've also tried doing this by setting
>> spark.hadoop.mapred.output.compression.type as a system parameter just
>> before initializing the spark context:
>> System.setProperty("spark.hadoop.mapred.output.compression.type",
>> "RECORD")
>>
>> In both cases, I can see that the resulting configuration as per
>> SparkContext.hadoopConfiguration has the property set to RECORD, but the
>> resulting sequence file still has its value uncompressed.
>>
>> At first, I thought that this is because io.compression.codecs was set to
>> null, so I set io.compression.codecs to the long list of codecs that is its
>> normal default value in a Hadoop environment, but still to no avail. Am I
>> missing a crucial step?
>>
>> Thanks,
>> Ameet
>>
>
>

Re: Saving compressed sequence files

Reply via email to