That sounds like a Hadoop list question. All I can say is there is a job.jar in mrlegacy/target with all dependencies packaged. This should have everything needed for lda.
On Jan 8, 2015, at 5:50 AM, mw <[email protected]> wrote: Hello again, maybe my question was misleading. I am asking whether the intended usage is to provide the job with the required library’s and sent those together with the job to yarn(if yes how can this be done?), or to add the required classes to the classpath of every node in the cluster. What is the best practice? Best, Max On 01/07/2015 06:13 PM, mw wrote: > Hello, > > the first error was due to a missing property in yarn.xml. However no i have > a different problem. > > > i am working on a web application that should execute lda on a external yarn > cluster. > > I am uploading all the relevant sequence files onto the yarn cluter. > This is how it try to remotely execute lda on the cluster. > > try { > ugi.doAs(new PrivilegedExceptionAction<Void>() { > public Void run() throws Exception { > Configuration hdoopConf = new Configuration(); > hdoopConf.set("fs.defaultFS", > "hdfs://xxx.xxx.xxx.xxx:9000/user/xx"); > hdoopConf.set("yarn.resourcemanager.hostname", > "xxx.xxx.xxx.xxx"); > hdoopConf.set("mapreduce.framework.name", "yarn"); > hdoopConf.set("mapred.framework.name", "yarn"); > hdoopConf.set("mapred.job.tracker", "xxx.xxx.xxx.xxx"); > hdoopConf.set("dfs.permissions.enabled", "false"); > hdoopConf.set("hadoop.job.ugi", "xx"); > hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" ); > CVB0Driver driver = new CVB0Driver(); > try { > driver.run(hdoopConf, sparseVectorIn.suffix("/matrix"), > topicsOut, k, numTerms, doc_topic_smoothening, > term_topic_smoothening, > maxIter, iteration_block_size, > convergenceDelta, > sparseVectorIn.suffix("/dictionary.file-0"), > topicsOut.suffix("/DocumentTopics/"), sparseVectorIn, > seed, testFraction, numTrainThreads, > numUpdateThreads, maxItersPerDoc, > numReduceTasks, backfillPerplexity); > } catch (ClassNotFoundException e) { > e.printStackTrace(); > } catch (InterruptedException e) { > e.printStackTrace(); > } > return null; > } > }); > } catch (InterruptedException e) { > e.printStackTrace(); > } > > I am getting the following error message: > > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) > at > org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) > at > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) > at > org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) > at > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) > at > org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) > at > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) > at > org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) > at > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > java.lang.InterruptedException: Failed to complete iteration 1 stage 1 > at > org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502) > at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319) > ... > > So apparently the job misses some mahout classes. How can i provide the > required classes to yarn? > > Best, > > Max
