My Code:
val dwsite
=
sc.sequenceFile("/sys/edw/dw_sites/snapshot/2015/10/18/00/part-r-00000",classOf[Text],
classOf[Text])
val records = dwsite.filter {
case (k, v) =>
if(v.toString.indexOf("Bhutan") != -1) true else false
}
records.saveAsNewAPIHadoopFile("dw_output12",classOf[Text],classOf[Text],
classOf[org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
[Text,Text]])
Error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage
2.0 (TID 4, localhost): java.lang.IllegalArgumentException: SequenceFile
doesn't work with GzipCodec without native-hadoop code!
I cannot install any libraries on this machine or on this cluster as i do
not have any kind of write access.
I am thinking of using a different compression codec and re-run the same
program and hope it works. Hence i included
sc.getConf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec")
but i still get the same error that implies above line did not affect the
compression codec of sequence file output format.
What is the fix ? Any suggestions.
Appreciate your time.
--
Deepak