Jar is not enough, you need native library (*.so) - see if your "native" directory contains it drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Oct 4 2017 native
and whether java.library.path or LD_LIBRARY_PATH points/includes directory where your *.so library resides On Thursday, May 3, 2018, 5:06:35 AM PDT, Fawze Abujaber <fawz...@gmail.com> wrote: Hi Guys, I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0. I tried to figure it out with no success. Will appreciate any help or a direction how to attack this issue. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not availableat com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)at org.apache.spark.scheduler.Task.run(Task.scala:89)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Driver stacktrace: I see the LZO at GPextras: lltotal 104-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct 4 2017 COPYING.hadoop-lzo-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct 4 2017 hadoop-lzo-0.4.15-cdh5.13.0.jarlrwxrwxrwx 1 cloudera-scm cloudera-scm 31 May 3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jardrwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Oct 4 2017 native -- Take CareFawze Abujaber