Re: Spark, snappy and HDFS

Sean Owen Thu, 02 Apr 2015 00:53:40 -0700

Yes, any Hadoop-related process that asks for Snappy compression or
needs to read it will have to have the Snappy libs available on the
library path. That's usually set up for you in a distro or you can do
it manually like this. This is not Spark-specific.


The second question also isn't Spark-specific; you do not have a
SequenceFile of byte[] / String, but of byte[] / byte[]. Review what
you are writing since it is not BytesWritable / Text.

On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers <n.e.trav...@gmail.com> wrote:
> I'm actually running this in a separate environment to our HDFS cluster.
>
> I think I've been able to sort out the issue by copying
> /opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm just
> using a one-worker setup at present) and adding the following to
> spark-env.sh:
>
> export
> JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/nickt/lib/hadoop/lib/native
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/nickt/lib/hadoop/lib/snappy-java-1.0.4.1.jar
>
> I can get past the previous error. The issue now seems to be with what is
> being returned.
>
> import org.apache.hadoop.io._
> val hdfsPath = "hdfs://nost.name/path/to/folder"
> val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
> file.count()
>
> returns the following error:
>
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
> cast to org.apache.hadoop.io.Text
>
>
> On Wed, Apr 1, 2015 at 7:34 PM, Xianjin YE <advance...@gmail.com> wrote:
>>
>> Do you have the same hadoop config for all nodes in your cluster(you run
>> it in a cluster, right?)?
>> Check the node(usually the executor) which gives the
>> java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the
>> hadoop native lib path.
>>
>> On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote:
>>
>> Thanks for the super quick response!
>>
>> I can read the file just fine in hadoop, it's just when I point Spark at
>> this file it can't seem to read it due to the missing snappy jars / so's.
>>
>> I'l paying around with adding some things to spark-env.sh file, but still
>> nothing.
>>
>> On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE <advance...@gmail.com> wrote:
>>
>> Can you read snappy compressed file in hdfs?  Looks like the libsnappy.so
>> is not in the hadoop native lib path.
>>
>> On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
>>
>> Has anyone else encountered the following error when trying to read a
>> snappy
>> compressed sequence file from HDFS?
>>
>> *java.lang.UnsatisfiedLinkError:
>> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z*
>>
>> The following works for me when the file is uncompressed:
>>
>> import org.apache.hadoop.io._
>> val hdfsPath = "hdfs://nost.name/path/to/folder"
>> val file = sc.sequenceFile[BytesWritable,String](hdfsPath)
>> file.count()
>>
>> but fails when the encoding is Snappy.
>>
>> I've seen some stuff floating around on the web about having to explicitly
>> enable support for Snappy in spark, but it doesn't seem to work for me:
>> http://www.ericlin.me/enabling-snappy-support-for-sharkspark
>> <http://www.ericlin.me/enabling-snappy-support-for-sharkspark>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-snappy-and-HDFS-tp22349.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark, snappy and HDFS

Reply via email to