I am running a Hadoop cluster with Spark on YARN. The cluster running the CDH5.2 distribution. When I try to run spark jobs against snappy compressed files I receive the following error.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110) org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) I have tried to set JAVA_LIBRARY_PATH, LD_LIBRARY_PATH, spark.executor.extraLibraryPath, spark.executor.extraClassPath and more with absolutely no luck. Additionally, I have confirmed that I can run map reduce jobs against snappy files without any problem and hadoop checknative looks good: $ hadoop checknative -a 14/12/11 13:51:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 14/12/11 13:51:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib/x86_64-linux-gnu/libz.so.1 snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1 openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 Can anyone give me any suggestions as to why this would not be working or better yet, how I can fix this problem? Thanks!!! Rich Haase | Sr. Software Engineer | Pandora m 303.887.1146 | rha...@pandora.com