I am running a Hadoop cluster with Spark on YARN.  The cluster running the 
CDH5.2 distribution.  When I try to run spark jobs against snappy compressed 
files I receive the following error.

java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
        
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        
org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        
org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110)
        
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

I have tried to set  JAVA_LIBRARY_PATH, LD_LIBRARY_PATH, 
spark.executor.extraLibraryPath, spark.executor.extraClassPath and more with 
absolutely no luck.

Additionally, I have confirmed that I can run map reduce jobs against snappy 
files without any problem and hadoop checknative looks good:

$ hadoop checknative -a
14/12/11 13:51:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized 
native-bzip2 library system-native
14/12/11 13:51:07 INFO zlib.ZlibFactory: Successfully loaded & initialized 
native-zlib library
Native library checking:
hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0

Can anyone give me any suggestions as to why this would not be working or 
better yet, how I can fix this problem?

Thanks!!!

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rha...@pandora.com

Reply via email to