Is the GPL library only available on the driver node? If that is the case, you need to add them to `--jars` option of spark-submit. -Xiangrui
On Thu, Aug 7, 2014 at 6:59 PM, Jikai Lei <hangel...@gmail.com> wrote: > I had the following error when trying to run a very simple spark job (which > uses logistic regression with SGD in mllib): > > ERROR GPLNativeCodeLoader: Could not load native gpl library > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path > at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738) > at java.lang.Runtime.loadLibrary0(Runtime.java:823) > at java.lang.System.loadLibrary(System.java:1028) > at > com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) > at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1659) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624) > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) > at > org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) > at > org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:187) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 14/08/06 20:32:11 ERROR LzoCodec: Cannot load native-lzo without > native-hadoop > > > This is the command I used to submit the job: > > ~/spark/spark-1.0.0-bin-hadoop2/bin/spark-submit \ > --class com.jk.sparktest.Test \ > --master yarn-cluster \ > --num-executors 40 \ > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar > > > The actual java command is : > > /usr/java/latest/bin/java -cp > /apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar::/home/jilei/spark/spark-1.0.0-bin-hadoop2/conf:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/apache/hadoop/conf:/apache/hadoop/conf > \ > -XX:MaxPermSize=128m \ > -Djava.library.path= > -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit \ > --class com.jk.sparktest.Test \ > --master yarn-cluster \ > --num-executors 40 \ > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar > > > Seems the -Djava.library.path is not set. I also tried the java command > above and supplied the native lib directory to the java.library.path, but > still got the same errors. > > Any idea on what's wrong? Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org