Re: Spark, snappy and HDFS

Nick Travers Thu, 02 Apr 2015 08:41:31 -0700

Thanks all. I was able to get the decompression working by adding the
following to my spark-env.sh script:


export
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export
SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar

On Thu, Apr 2, 2015 at 12:51 AM, Sean Owen <so...@cloudera.com> wrote:

> Yes, any Hadoop-related process that asks for Snappy compression or
> needs to read it will have to have the Snappy libs available on the
> library path. That's usually set up for you in a distro or you can do
> it manually like this. This is not Spark-specific.
>
> The second question also isn't Spark-specific; you do not have a
> SequenceFile of byte[] / String, but of byte[] / byte[]. Review what
> you are writing since it is not BytesWritable / Text.
>
> On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers <n.e.trav...@gmail.com>
> wrote:
> > I'm actually running this in a separate environment to our HDFS cluster.
> >
> > I think I've been able to sort out the issue by copying
> > /opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm
> just
> > using a one-worker setup at present) and adding the following to
> > spark-env.sh:
> >
> > export
> > JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> > export
> > SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> > export
> >
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar
> >
> > I can get past the previous error. The issue now seems to be with what is
> > being returned.
> >
> > import org.apache.hadoop.io._
> > val hdfsPath = "hdfs://nost.name/path/to/folder"
> > val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
> > file.count()
> >
> > returns the following error:
> >
> > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot
> be
> > cast to org.apache.hadoop.io.Text
> >
> >
> > On Wed, Apr 1, 2015 at 7:34 PM, Xianjin YE <advance...@gmail.com> wrote:
> >>
> >> Do you have the same hadoop config for all nodes in your cluster(you run
> >> it in a cluster, right?)?
> >> Check the node(usually the executor) which gives the
> >> java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the
> >> hadoop native lib path.
> >>
> >> On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote:
> >>
> >> Thanks for the super quick response!
> >>
> >> I can read the file just fine in hadoop, it's just when I point Spark at
> >> this file it can't seem to read it due to the missing snappy jars /
> so's.
> >>
> >> I'l paying around with adding some things to spark-env.sh file, but
> still
> >> nothing.
> >>
> >> On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE <advance...@gmail.com>
> wrote:
> >>
> >> Can you read snappy compressed file in hdfs?  Looks like the
> libsnappy.so
> >> is not in the hadoop native lib path.
> >>
> >> On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
> >>
> >> Has anyone else encountered the following error when trying to read a
> >> snappy
> >> compressed sequence file from HDFS?
> >>
> >> *java.lang.UnsatisfiedLinkError:
> >> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z*
> >>
> >> The following works for me when the file is uncompressed:
> >>
> >> import org.apache.hadoop.io._
> >> val hdfsPath = "hdfs://nost.name/path/to/folder"
> >> val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
> >> file.count()
> >>
> >> but fails when the encoding is Snappy.
> >>
> >> I've seen some stuff floating around on the web about having to
> explicitly
> >> enable support for Snappy in spark, but it doesn't seem to work for me:
> >> http://www.ericlin.me/enabling-snappy-support-for-sharkspark
> >> <http://www.ericlin.me/enabling-snappy-support-for-sharkspark>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-snappy-and-HDFS-tp22349.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >>
> >>
> >>
> >
>

Re: Spark, snappy and HDFS

Reply via email to