I'm actually running this in a separate environment to our HDFS cluster. I think I've been able to sort out the issue by copying /opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm just using a one-worker setup at present) and adding the following to spark-env.sh:
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar I can get past the previous error. The issue now seems to be with what is being returned. import org.apache.hadoop.io._ val hdfsPath = "hdfs://nost.name/path/to/folder" val file = sc.sequenceFile[BytesWritable,String](hdfsPath) file.count() returns the following error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text On Wed, Apr 1, 2015 at 7:34 PM, Xianjin YE <advance...@gmail.com> wrote: > Do you have the same hadoop config for all nodes in your cluster(you run > it in a cluster, right?)? > Check the node(usually the executor) which gives the > java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the > hadoop native lib path. > > On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote: > > Thanks for the super quick response! > > I can read the file just fine in hadoop, it's just when I point Spark at > this file it can't seem to read it due to the missing snappy jars / so's. > > I'l paying around with adding some things to spark-env.sh file, but still > nothing. > > On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE <advance...@gmail.com> wrote: > > Can you read snappy compressed file in hdfs? Looks like the libsnappy.so > is not in the hadoop native lib path. > > On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote: > > Has anyone else encountered the following error when trying to read a > snappy > compressed sequence file from HDFS? > > *java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z* > > The following works for me when the file is uncompressed: > > import org.apache.hadoop.io._ > val hdfsPath = "hdfs://nost.name/path/to/folder" > val file = sc.sequenceFile[BytesWritable,String](hdfsPath) > file.count() > > but fails when the encoding is Snappy. > > I've seen some stuff floating around on the web about having to explicitly > enable support for Snappy in spark, but it doesn't seem to work for me: > http://www.ericlin.me/enabling-snappy-support-for-sharkspark > <http://www.ericlin.me/enabling-snappy-support-for-sharkspark> > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-snappy-and-HDFS-tp22349.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > >