I don't think it's a system property. There is support for adding compression to the save function in the latest 0.8 code: https://github.com/mesos/spark/blob/master/core/src/main/scala/spark/PairRDDFunctions.scala#L609
You can take a look at how that is done. -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Wed, Aug 28, 2013 at 6:56 AM, Ameet Kini <[email protected]> wrote: > Folks, > > Still stuck on this, so would greatly appreciate any pointers as to how to > force Spark to recognize the mapred.output.compression.type hadoop > parameter. > > Thanks, > Ameet > > > On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <[email protected]> wrote: > >> >> I'm trying to use saveAsSequenceFile to output compressed sequenced files >> where the "value" in each key,value pair is compressed. In Hadoop, I would >> set this job configuration parameter: >> "mapred.output.compression.type=RECORD" for record level compression. >> Previous posts have suggested that this is possible by simply setting this >> parameter in the core-site.xml. I tried doing just that, and the sequence >> file doesn't seem to be compressed. >> >> I've also tried doing this by setting >> spark.hadoop.mapred.output.compression.type as a system parameter just >> before initializing the spark context: >> System.setProperty("spark.hadoop.mapred.output.compression.type", >> "RECORD") >> >> In both cases, I can see that the resulting configuration as per >> SparkContext.hadoopConfiguration has the property set to RECORD, but the >> resulting sequence file still has its value uncompressed. >> >> At first, I thought that this is because io.compression.codecs was set to >> null, so I set io.compression.codecs to the long list of codecs that is its >> normal default value in a Hadoop environment, but still to no avail. Am I >> missing a crucial step? >> >> Thanks, >> Ameet >> > >
