Hi Jikai,

The reason I ask is because your stacktrace has this section in it:

com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(
GPLNativeCodeLoader.java:32)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
Configuration.java:1659)
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624)
    at
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(
CompressionCodecFactory.java:128)
    at
org.apache.hadoop.io.compress.CompressionCodecFactory.<init>
(CompressionCodecFactory.java:175)

Maybe you have the Lzo codec defined in your
core-site.xml:io.compression.codecs
setting?  In the short run you could disable it.

In the long run, I wonder if this is an issue with YARN not propagating the
setting through to the executors.  Have you tried in other cluster
deployment modes?




On Fri, Aug 8, 2014 at 7:38 AM, Jikai Lei <hangel...@gmail.com> wrote:

> Thanks Andrew.  Actually my job did not use any data in .lzo format. Here
> is
> the program itself:
>
> import org.apache.spark._
> import org.apache.spark.mllib.util.MLUtils
> import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
>
> object Test {
>   def main(args: Array[String]) {
>     val sparkConf = new SparkConf().setAppName("SparkMLTest")
>     val sc = new SparkContext(sparkConf)
>         val training = MLUtils.loadLibSVMFile(sc,
> "hdfs://url:8020/user/jilei/sparktesttraining_libsvmfmt_10k.txt"
>         val model = LogisticRegressionWithSGD.train(training, numIterations
> = 20)
>   }
> }
>
> I copied this form a github gist and want to have a try. The file is a
> libsvm format file and is in HDFS (I removed the actual hdfs url here in
> the
> code.)
>
> And in the spark-env.sh file, I set the evns:
> export SPARK_LIBRARY_PATH=/apache/hadoop/lib/native/
> export
>
> SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar
>
> Here is the content of the /apache/hadoop/lib/native/ folder:
> ls /apache/hadoop/lib/native/
> libgplcompression.a   libgplcompression.so    libgplcompression.so.0.0.0
> libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a   libhdfs.so.0.0.0
> libsnappy.so.1
> libgplcompression.la  libgplcompression.so.0  libhadoop.a
> libhadoop.so      libhadooputils.a    libhdfs.so  libsnappy.so
> libsnappy.so.1.1.4
>
>
>
>
> Andrew Ash wrote
> > Hi Jikai,
> >
> > It looks like you're trying to run a Spark job on data that's stored in
> > HDFS in .lzo format.  Spark can handle this (I do it all the time), but
> > you
> > need to configure your Spark installation to know about the .lzo format.
> >
> > There are two parts to the hadoop lzo library -- the first is the jar
> > (hadoop-lzo.jar) and the second is the native library
> > (libgplcompression.{a,so,la} and liblzo2.{a,so,la}).  You need the jar on
> > the classpath across your cluster, but also the native libraries exposed
> > as
> > well.
> >
> > In Spark 1.0.1 I modify entries in spark-env.sh: set SPARK_LIBRARY_PATH
> to
> > include the path to the native library directory
> > (e.g. /path/to/hadoop/lib/native/Linux-amd64-64) and SPARK_CLASSPATH to
> > include the hadoop-lzo jar.
> >
> > Hope that helps,
> > Andrew
> >
> >
> > On Thu, Aug 7, 2014 at 7:19 PM, Xiangrui Meng &lt;
>
> > mengxr@
>
> > &gt; wrote:
> >
> >> Is the GPL library only available on the driver node? If that is the
> >> case, you need to add them to `--jars` option of spark-submit.
> >> -Xiangrui
> >>
> >> On Thu, Aug 7, 2014 at 6:59 PM, Jikai Lei &lt;
>
> > hangelwen@
>
> > &gt; wrote:
> >> > I had the following error when trying to run a very simple spark job
> >> (which
> >> > uses logistic regression with SGD in mllib):
> >> >
> >> > ERROR GPLNativeCodeLoader: Could not load native gpl library
> >> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> >> >     at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
> >> >     at java.lang.Runtime.loadLibrary0(Runtime.java:823)
> >> >     at java.lang.System.loadLibrary(System.java:1028)
> >> >     at
> >> >
> >> com.hadoop.compression.lzo.GPLNativeCodeLoader.
> > <clinit>
> > (GPLNativeCodeLoader.java:32)
> >> >     at com.hadoop.compression.lzo.LzoCodec.
> > <clinit>
> > (LzoCodec.java:71)
> >> >     at java.lang.Class.forName0(Native Method)
> >> >     at java.lang.Class.forName(Class.java:247)
> >> >     at
> >> >
> >>
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1659)
> >> >     at
> >> >
> >>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624)
> >> >     at
> >> >
> >>
> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
> >> >     at
> >> >
> >> org.apache.hadoop.io.compress.CompressionCodecFactory.
> > <init>
> > (CompressionCodecFactory.java:175)
> >> >     at
> >> >
> >>
> org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
> >> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >     at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >     at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >> >     at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> >> >     at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> >> >     at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> >> >     at
> >> org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155)
> >> >     at org.apache.spark.rdd.HadoopRDD$$anon$1.
> > <init>
> > (HadoopRDD.scala:187)
> >> >     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181)
> >> >     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> >> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >> >     at
> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> >> >     at org.apache.spark.scheduler.Task.run(Task.scala:51)
> >> >     at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> >> >     at
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >     at
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >     at java.lang.Thread.run(Thread.java:662)
> >> > 14/08/06 20:32:11 ERROR LzoCodec: Cannot load native-lzo without
> >> > native-hadoop
> >> >
> >> >
> >> > This is the command I used to submit the job:
> >> >
> >> > ~/spark/spark-1.0.0-bin-hadoop2/bin/spark-submit \
> >> > --class com.jk.sparktest.Test \
> >> > --master yarn-cluster \
> >> > --num-executors 40 \
> >> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> >> >
> >> >
> >> > The actual java command is :
> >> >
> >> > /usr/java/latest/bin/java -cp
> >> >
> >>
> /apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar::/home/jilei/spark/spark-1.0.0-bin-hadoop2/conf:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/apache/hadoop/conf:/apache/hadoop/conf
> >> > \
> >> > -XX:MaxPermSize=128m \
> >> > -Djava.library.path=
> >> > -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit \
> >> > --class com.jk.sparktest.Test  \
> >> > --master yarn-cluster  \
> >> > --num-executors 40   \
> >> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> >> >
> >> >
> >> > Seems the -Djava.library.path is not set. I also tried the java
> command
> >> > above and supplied the native lib directory to the java.library.path,
> >> but
> >> > still got the same errors.
> >> >
> >> > Any idea on what's wrong? Thanks.
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743.html
> >> > Sent from the Apache Spark User List mailing list archive at
> >> Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail:
>
> > user-unsubscribe@.apache
>
> >> > For additional commands, e-mail:
>
> > user-help@.apache
>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
>
> > user-unsubscribe@.apache
>
> >> For additional commands, e-mail:
>
> > user-help@.apache
>
> >>
> >>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743p11794.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to