Re: Spark, snappy and HDFS

Nick Travers Wed, 01 Apr 2015 19:43:40 -0700

I'm actually running this in a separate environment to our HDFS cluster.

I think I've been able to sort out the issue by copying
/opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm just
using a one-worker setup at present) and adding the following to
spark-env.sh:


export
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export
SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar

I can get past the previous error. The issue now seems to be with what is
being returned.

import org.apache.hadoop.io._
val hdfsPath = "hdfs://nost.name/path/to/folder"
val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
file.count()

returns the following error:

java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
cast to org.apache.hadoop.io.Text


On Wed, Apr 1, 2015 at 7:34 PM, Xianjin YE <advance...@gmail.com> wrote:

> Do you have the same hadoop config for all nodes in your cluster(you run
> it in a cluster, right?)?
> Check the node(usually the executor) which gives the
> java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the
> hadoop native lib path.
>
> On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote:
>
> Thanks for the super quick response!
>
> I can read the file just fine in hadoop, it's just when I point Spark at
> this file it can't seem to read it due to the missing snappy jars / so's.
>
> I'l paying around with adding some things to spark-env.sh file, but still
> nothing.
>
> On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE <advance...@gmail.com> wrote:
>
> Can you read snappy compressed file in hdfs?  Looks like the libsnappy.so
> is not in the hadoop native lib path.
>
> On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
>
> Has anyone else encountered the following error when trying to read a
> snappy
> compressed sequence file from HDFS?
>
> *java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z*
>
> The following works for me when the file is uncompressed:
>
> import org.apache.hadoop.io._
> val hdfsPath = "hdfs://nost.name/path/to/folder"
> val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
> file.count()
>
> but fails when the encoding is Snappy.
>
> I've seen some stuff floating around on the web about having to explicitly
> enable support for Snappy in spark, but it doesn't seem to work for me:
> http://www.ericlin.me/enabling-snappy-support-for-sharkspark
> <http://www.ericlin.me/enabling-snappy-support-for-sharkspark>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-snappy-and-HDFS-tp22349.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>

Re: Spark, snappy and HDFS

Reply via email to