Thanks all. I was able to get the decompression working by adding the following to my spark-env.sh script:
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar On Thu, Apr 2, 2015 at 12:51 AM, Sean Owen <so...@cloudera.com> wrote: > Yes, any Hadoop-related process that asks for Snappy compression or > needs to read it will have to have the Snappy libs available on the > library path. That's usually set up for you in a distro or you can do > it manually like this. This is not Spark-specific. > > The second question also isn't Spark-specific; you do not have a > SequenceFile of byte[] / String, but of byte[] / byte[]. Review what > you are writing since it is not BytesWritable / Text. > > On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers <n.e.trav...@gmail.com> > wrote: > > I'm actually running this in a separate environment to our HDFS cluster. > > > > I think I've been able to sort out the issue by copying > > /opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm > just > > using a one-worker setup at present) and adding the following to > > spark-env.sh: > > > > export > > JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native > > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native > > export > > SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native > > export > > > SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar > > > > I can get past the previous error. The issue now seems to be with what is > > being returned. > > > > import org.apache.hadoop.io._ > > val hdfsPath = "hdfs://nost.name/path/to/folder" > > val file = sc.sequenceFile[BytesWritable,String](hdfsPath) > > file.count() > > > > returns the following error: > > > > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot > be > > cast to org.apache.hadoop.io.Text > > > > > > On Wed, Apr 1, 2015 at 7:34 PM, Xianjin YE <advance...@gmail.com> wrote: > >> > >> Do you have the same hadoop config for all nodes in your cluster(you run > >> it in a cluster, right?)? > >> Check the node(usually the executor) which gives the > >> java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the > >> hadoop native lib path. > >> > >> On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote: > >> > >> Thanks for the super quick response! > >> > >> I can read the file just fine in hadoop, it's just when I point Spark at > >> this file it can't seem to read it due to the missing snappy jars / > so's. > >> > >> I'l paying around with adding some things to spark-env.sh file, but > still > >> nothing. > >> > >> On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE <advance...@gmail.com> > wrote: > >> > >> Can you read snappy compressed file in hdfs? Looks like the > libsnappy.so > >> is not in the hadoop native lib path. > >> > >> On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote: > >> > >> Has anyone else encountered the following error when trying to read a > >> snappy > >> compressed sequence file from HDFS? > >> > >> *java.lang.UnsatisfiedLinkError: > >> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z* > >> > >> The following works for me when the file is uncompressed: > >> > >> import org.apache.hadoop.io._ > >> val hdfsPath = "hdfs://nost.name/path/to/folder" > >> val file = sc.sequenceFile[BytesWritable,String](hdfsPath) > >> file.count() > >> > >> but fails when the encoding is Snappy. > >> > >> I've seen some stuff floating around on the web about having to > explicitly > >> enable support for Snappy in spark, but it doesn't seem to work for me: > >> http://www.ericlin.me/enabling-snappy-support-for-sharkspark > >> <http://www.ericlin.me/enabling-snappy-support-for-sharkspark> > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-snappy-and-HDFS-tp22349.html > >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > >> > >> > >> > > >