Hi Jikai, The reason I ask is because your stacktrace has this section in it:
com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>( GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull( Configuration.java:1659) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses( CompressionCodecFactory.java:128) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init> (CompressionCodecFactory.java:175) Maybe you have the Lzo codec defined in your core-site.xml:io.compression.codecs setting? In the short run you could disable it. In the long run, I wonder if this is an issue with YARN not propagating the setting through to the executors. Have you tried in other cluster deployment modes? On Fri, Aug 8, 2014 at 7:38 AM, Jikai Lei <hangel...@gmail.com> wrote: > Thanks Andrew. Actually my job did not use any data in .lzo format. Here > is > the program itself: > > import org.apache.spark._ > import org.apache.spark.mllib.util.MLUtils > import org.apache.spark.mllib.classification.LogisticRegressionWithSGD > > object Test { > def main(args: Array[String]) { > val sparkConf = new SparkConf().setAppName("SparkMLTest") > val sc = new SparkContext(sparkConf) > val training = MLUtils.loadLibSVMFile(sc, > "hdfs://url:8020/user/jilei/sparktesttraining_libsvmfmt_10k.txt" > val model = LogisticRegressionWithSGD.train(training, numIterations > = 20) > } > } > > I copied this form a github gist and want to have a try. The file is a > libsvm format file and is in HDFS (I removed the actual hdfs url here in > the > code.) > > And in the spark-env.sh file, I set the evns: > export SPARK_LIBRARY_PATH=/apache/hadoop/lib/native/ > export > > SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar > > Here is the content of the /apache/hadoop/lib/native/ folder: > ls /apache/hadoop/lib/native/ > libgplcompression.a libgplcompression.so libgplcompression.so.0.0.0 > libhadooppipes.a libhadoop.so.1.0.0 libhdfs.a libhdfs.so.0.0.0 > libsnappy.so.1 > libgplcompression.la libgplcompression.so.0 libhadoop.a > libhadoop.so libhadooputils.a libhdfs.so libsnappy.so > libsnappy.so.1.1.4 > > > > > Andrew Ash wrote > > Hi Jikai, > > > > It looks like you're trying to run a Spark job on data that's stored in > > HDFS in .lzo format. Spark can handle this (I do it all the time), but > > you > > need to configure your Spark installation to know about the .lzo format. > > > > There are two parts to the hadoop lzo library -- the first is the jar > > (hadoop-lzo.jar) and the second is the native library > > (libgplcompression.{a,so,la} and liblzo2.{a,so,la}). You need the jar on > > the classpath across your cluster, but also the native libraries exposed > > as > > well. > > > > In Spark 1.0.1 I modify entries in spark-env.sh: set SPARK_LIBRARY_PATH > to > > include the path to the native library directory > > (e.g. /path/to/hadoop/lib/native/Linux-amd64-64) and SPARK_CLASSPATH to > > include the hadoop-lzo jar. > > > > Hope that helps, > > Andrew > > > > > > On Thu, Aug 7, 2014 at 7:19 PM, Xiangrui Meng < > > > mengxr@ > > > > wrote: > > > >> Is the GPL library only available on the driver node? If that is the > >> case, you need to add them to `--jars` option of spark-submit. > >> -Xiangrui > >> > >> On Thu, Aug 7, 2014 at 6:59 PM, Jikai Lei < > > > hangelwen@ > > > > wrote: > >> > I had the following error when trying to run a very simple spark job > >> (which > >> > uses logistic regression with SGD in mllib): > >> > > >> > ERROR GPLNativeCodeLoader: Could not load native gpl library > >> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path > >> > at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738) > >> > at java.lang.Runtime.loadLibrary0(Runtime.java:823) > >> > at java.lang.System.loadLibrary(System.java:1028) > >> > at > >> > > >> com.hadoop.compression.lzo.GPLNativeCodeLoader. > > <clinit> > > (GPLNativeCodeLoader.java:32) > >> > at com.hadoop.compression.lzo.LzoCodec. > > <clinit> > > (LzoCodec.java:71) > >> > at java.lang.Class.forName0(Native Method) > >> > at java.lang.Class.forName(Class.java:247) > >> > at > >> > > >> > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1659) > >> > at > >> > > >> > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624) > >> > at > >> > > >> > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) > >> > at > >> > > >> org.apache.hadoop.io.compress.CompressionCodecFactory. > > <init> > > (CompressionCodecFactory.java:175) > >> > at > >> > > >> > org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > at > >> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> > at > >> > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> > at java.lang.reflect.Method.invoke(Method.java:597) > >> > at > >> > > >> > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > >> > at > >> > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > >> > at > >> > > >> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > >> > at > >> org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155) > >> > at org.apache.spark.rdd.HadoopRDD$$anon$1. > > <init> > > (HadoopRDD.scala:187) > >> > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181) > >> > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > >> > at > >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > >> > at org.apache.spark.scheduler.Task.run(Task.scala:51) > >> > at > >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > >> > at > >> > > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> > at > >> > > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> > at java.lang.Thread.run(Thread.java:662) > >> > 14/08/06 20:32:11 ERROR LzoCodec: Cannot load native-lzo without > >> > native-hadoop > >> > > >> > > >> > This is the command I used to submit the job: > >> > > >> > ~/spark/spark-1.0.0-bin-hadoop2/bin/spark-submit \ > >> > --class com.jk.sparktest.Test \ > >> > --master yarn-cluster \ > >> > --num-executors 40 \ > >> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar > >> > > >> > > >> > The actual java command is : > >> > > >> > /usr/java/latest/bin/java -cp > >> > > >> > /apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar::/home/jilei/spark/spark-1.0.0-bin-hadoop2/conf:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/apache/hadoop/conf:/apache/hadoop/conf > >> > \ > >> > -XX:MaxPermSize=128m \ > >> > -Djava.library.path= > >> > -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit \ > >> > --class com.jk.sparktest.Test \ > >> > --master yarn-cluster \ > >> > --num-executors 40 \ > >> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar > >> > > >> > > >> > Seems the -Djava.library.path is not set. I also tried the java > command > >> > above and supplied the native lib directory to the java.library.path, > >> but > >> > still got the same errors. > >> > > >> > Any idea on what's wrong? Thanks. > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743.html > >> > Sent from the Apache Spark User List mailing list archive at > >> Nabble.com. > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: > > > user-unsubscribe@.apache > > >> > For additional commands, e-mail: > > > user-help@.apache > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: > > > user-unsubscribe@.apache > > >> For additional commands, e-mail: > > > user-help@.apache > > >> > >> > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743p11794.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >