Folks, Still stuck on this, so would greatly appreciate any pointers as to how to force Spark to recognize the mapred.output.compression.type hadoop parameter.
Thanks, Ameet On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <[email protected]> wrote: > > I'm trying to use saveAsSequenceFile to output compressed sequenced files > where the "value" in each key,value pair is compressed. In Hadoop, I would > set this job configuration parameter: > "mapred.output.compression.type=RECORD" for record level compression. > Previous posts have suggested that this is possible by simply setting this > parameter in the core-site.xml. I tried doing just that, and the sequence > file doesn't seem to be compressed. > > I've also tried doing this by setting > spark.hadoop.mapred.output.compression.type as a system parameter just > before initializing the spark context: > System.setProperty("spark.hadoop.mapred.output.compression.type", "RECORD") > > In both cases, I can see that the resulting configuration as per > SparkContext.hadoopConfiguration has the property set to RECORD, but the > resulting sequence file still has its value uncompressed. > > At first, I thought that this is because io.compression.codecs was set to > null, so I set io.compression.codecs to the long list of codecs that is its > normal default value in a Hadoop environment, but still to no avail. Am I > missing a crucial step? > > Thanks, > Ameet >
