For textFile I believe we overload it and let you set a codec directly: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59
For saveAsSequenceFile yep, I think Mark is right, you need an option. On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > http://www.scala-lang.org/api/2.10.3/index.html#scala.Option > > The signature is 'def saveAsSequenceFile(path: String, codec: > Option[Class[_ <: CompressionCodec]] = None)', but you are providing a > Class, not an Option[Class]. > > Try counts.saveAsSequenceFile(output, > Some(classOf[org.apache.hadoop.io.compress.SnappyCodec])) > > > > On Wed, Apr 2, 2014 at 12:18 PM, Kostiantyn Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Hi there, >> >> >> I've started using Spark recently and evaluating possible use cases in >> our company. >> >> I'm trying to save RDD as compressed Sequence file. I'm able to save >> non-compressed file be calling: >> >> counts.saveAsSequenceFile(output) >> >> where counts is my RDD (IntWritable, Text). However, I didn't manage to >> compress output. I tried several configurations and always got exception: >> >> counts.saveAsSequenceFile(output, >> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >> <console>:21: error: type mismatch; >> found : >> Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec]) >> required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] >> counts.saveAsSequenceFile(output, >> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >> >> counts.saveAsSequenceFile(output, >> classOf[org.apache.spark.io.SnappyCompressionCodec]) >> <console>:21: error: type mismatch; >> found : >> Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec]) >> required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] >> counts.saveAsSequenceFile(output, >> classOf[org.apache.spark.io.SnappyCompressionCodec]) >> >> and it doesn't work even for Gzip: >> >> counts.saveAsSequenceFile(output, >> classOf[org.apache.hadoop.io.compress.GzipCodec]) >> <console>:21: error: type mismatch; >> found : >> Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec]) >> required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] >> counts.saveAsSequenceFile(output, >> classOf[org.apache.hadoop.io.compress.GzipCodec]) >> >> Could you please suggest solution? also, I didn't find how is it possible >> to specify compression parameters (i.e. compression type for Snappy). I >> wondered if you could share code snippets for writing/reading RDD with >> compression? >> >> Thank you in advance, >> Konstantin Kudryavtsev >> > >