i suggested to start with -k 10 not 100 to iron stuff out. that will reduce debug cycle quite a bit.
the problem with classpath is definitely something reproducible and functional, it is not a known problem to me. But... there are so many flavors of hadoop that various issues like that pop up all the time. On Tue, Apr 28, 2015 at 12:30 PM, Mihai Dascalu <mihai.dasc...@cs.pub.ro> wrote: > I got the same problem with k=100 & p=15, aBlockRows=200000 faster now > (around 20minutes) > > I just realized that it’s at a final step in the processing (I’ve attached > the end part of the log) > > Any suggestions? In my Eclipse project I have imported: > httpclient-4.2.5.jar > mahout-hdfs-0.10.0.jar > mahout-integration-0.10.0.jar > mahout-math-0.10.0.jar > mahout-mr-0.10.0-job.jar > mahout-mr-0.10.0.jar > > The strange part is that it works ok if I run it directly in the terminal. > > > Thanks! > Mihai > > 1415149 [Thread-13] INFO org.apache.hadoop.mapred.LocalJobRunner - map > task executor complete. > 1415151 [Thread-13] DEBUG > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Merging data > from DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox > (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000; > isDirectory=true; modification_time=1430245778000; access_time=0; owner=; > group=; permission=rwxrwxrwx; isSymlink=false} to > file:/Users/mihaidascalu/Dropbox > (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job > 1415151 [Thread-13] DEBUG > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Merging data > from DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox > (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000/part-m-00000.deflate; > isDirectory=false; length=8; replication=1; blocksize=33554432; > modification_time=1430247134000; access_time=0; owner=; group=; > permission=rw-rw-rw-; isSymlink=false} to file:/Users/mihaidascalu/Dropbox > (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/part-m-00000.deflate > 1415157 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - > job_local1889167692_0001 > java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546) > Caused by: java.lang.ClassNotFoundException: > org.apache.commons.httpclient.HttpMethod > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 1 more > 1415159 [Thread-13] DEBUG org.apache.hadoop.security.UserGroupInformation > - PrivilegedAction as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330) > Exception in thread "Thread-13" java.lang.NoClassDefFoundError: > org/apache/commons/httpclient/HttpMethod > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562) > Caused by: java.lang.ClassNotFoundException: > org.apache.commons.httpclient.HttpMethod > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 1 more > 1415709 [SwingWorker-pool-1-thread-1] DEBUG > org.apache.hadoop.security.UserGroupInformation - PrivilegedAction > as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) > 1415709 [SwingWorker-pool-1-thread-1] DEBUG > org.apache.hadoop.security.UserGroupInformation - PrivilegedAction > as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) > 1415709 [SwingWorker-pool-1-thread-1] DEBUG > org.apache.hadoop.security.UserGroupInformation - PrivilegedAction > as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) > 1415709 [SwingWorker-pool-1-thread-1] ERROR > view.widgets.semanticModels.SemanticModelsTraining - Error procesing > config/LDA directory: Q job unsuccessful. > > > On 28 Apr 2015, at 10:32, Dmitriy Lyubimov <dlie...@gmail.com <mailto: > dlie...@gmail.com>> wrote: > > > > if your run time gets too high, try to start with low -k (like 10 or > > something) and -q=0, that will significantly reduce complexity of the > > problem. > > > > if this works, you need to find optimal levers that suit your > > hardware/input size/ runtime requirements. ( I can tell you right away > that > > (k+p) value influences single task runtime according to power law). Like > > something like -k 500 will probably not yield a satisfactory time ever. > The > > performance study in Nathan Halko's dissertation computed first 100 > > singlular values/vectors iirc. i.e about k=100, p=15. > > > > getting -q=1 boosts accuracy significantly, so if you can affort it at > all > > time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter > off > > the default value. Values -q >1 are never practical. > > > > > > -d > > > > > > > > On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <mihai.dasc...@cs.pub.ro > <mailto:mihai.dasc...@cs.pub.ro>> > > wrote: > > > >> I’ve created a SWING interface around the invocation, but it is not a > >> classpath setting as the SVD runs for more than 1h. Afterwards I have > the > >> runtime error in the HTTPclient, which is really strange. Also I have a > lot > >> of map operations in the console, but no reduce operations are logged. > >> > >> Thanks! > >> Mihai > >> > >>> On 28 Apr 2015, at 01:09, lastarsenal <lastarse...@163.com <mailto: > lastarse...@163.com>> wrote: > >>> > >>> What's your run command? I think it is because of your classpath > setting. > >>> > >>> > >>> > >>> > >>> At 2015-04-28 15:25:01, "Mihai Dascalu" <mihai.dasc...@cs.pub.ro > <mailto:mihai.dasc...@cs.pub.ro>> wrote: > >>>> Hi! > >>>> > >>>> > >>>> I’ve been experimenting with the SSVDSolver and unfortunately, during > >> runtime, I encounter this error: > >>>> > >>>> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - > >> job_local1958711697_0001 > >>>> java.lang.NoClassDefFoundError: > org/apache/commons/httpclient/HttpMethod > >>>> at > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546) > >>>> Caused by: java.lang.ClassNotFoundException: > >> org.apache.commons.httpclient.HttpMethod > >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >>>> ... 1 more > >>>> > >>>> Exception in thread "Thread-13" java.lang.NoClassDefFoundError: > >> org/apache/commons/httpclient/HttpMethod > >>>> at > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562) > >>>> Caused by: java.lang.ClassNotFoundException: > >> org.apache.commons.httpclient.HttpMethod > >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >>>> ... 1 more > >>>> > >>>> The actual invocation is: > >>>> > >>>> public static void runSSVDOnSparseVectors(String inputPath, > >>>> String outputPath, int rank, int oversampling, int > >> blocks, > >>>> int reduceTasks, int powerIterations, boolean > >> halfSigma) > >>>> throws IOException { > >>>> Configuration conf = new Configuration(); > >>>> SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path( > >>>> inputPath) }, new Path(outputPath), blocks, rank, > >> oversampling, > >>>> reduceTasks); > >>>> solver.setQ(powerIterations); > >>>> if (halfSigma) { > >>>> solver.setcUHalfSigma(true); > >>>> solver.setcVHalfSigma(true); > >>>> } > >>>> solver.run(); > >>>> } > >>>> > >>>> while being invoked with (input.getParent() + “/" + > >> TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 * > k, > >> Math.min(200000, (int) (3 * k * 0.01 * > >> Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2, > >> true); > >>>> > >>>> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5 > >> from the package archive) on a 48k words X 53k docs matrix. > >>>> > >>>> Any ideas? It works fine with the similar variables if I run the job > in > >> command line. > >>>> > >>>> Also, how should I tweak the input variables? > >>>> > >>>> > >>>> Thanks in advance! > >>>> Mihai > >> > >> > >