i suggested to start with -k 10 not 100 to iron stuff out. that will reduce
debug cycle quite a bit.

the problem with classpath is definitely something reproducible and
functional, it is not a known problem to me. But... there are so many
flavors of hadoop that various issues like that pop up all the time.

On Tue, Apr 28, 2015 at 12:30 PM, Mihai Dascalu <mihai.dasc...@cs.pub.ro>
wrote:

> I got the same problem with k=100 & p=15, aBlockRows=200000 faster now
> (around 20minutes)
>
> I just realized that it’s at a final step in the processing (I’ve attached
> the end part of the log)
>
> Any suggestions? In my Eclipse project I have imported:
> httpclient-4.2.5.jar
> mahout-hdfs-0.10.0.jar
> mahout-integration-0.10.0.jar
> mahout-math-0.10.0.jar
> mahout-mr-0.10.0-job.jar
> mahout-mr-0.10.0.jar
>
> The strange part is that it works ok if I run it directly in the terminal.
>
>
> Thanks!
> Mihai
>
> 1415149 [Thread-13] INFO org.apache.hadoop.mapred.LocalJobRunner  - map
> task executor complete.
> 1415151 [Thread-13] DEBUG
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  - Merging data
> from DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox
> (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000;
> isDirectory=true; modification_time=1430245778000; access_time=0; owner=;
> group=; permission=rwxrwxrwx; isSymlink=false} to
> file:/Users/mihaidascalu/Dropbox
> (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job
> 1415151 [Thread-13] DEBUG
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  - Merging data
> from DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox
> (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000/part-m-00000.deflate;
> isDirectory=false; length=8; replication=1; blocksize=33554432;
> modification_time=1430247134000; access_time=0; owner=; group=;
> permission=rw-rw-rw-; isSymlink=false} to file:/Users/mihaidascalu/Dropbox
> (Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/part-m-00000.deflate
> 1415157 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner  -
> job_local1889167692_0001
> java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.httpclient.HttpMethod
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 1 more
> 1415159 [Thread-13] DEBUG org.apache.hadoop.security.UserGroupInformation
> - PrivilegedAction as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
> Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
> org/apache/commons/httpclient/HttpMethod
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.httpclient.HttpMethod
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 1 more
> 1415709 [SwingWorker-pool-1-thread-1] DEBUG
> org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction
> as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
> 1415709 [SwingWorker-pool-1-thread-1] DEBUG
> org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction
> as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
> 1415709 [SwingWorker-pool-1-thread-1] DEBUG
> org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction
> as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
> 1415709 [SwingWorker-pool-1-thread-1] ERROR
> view.widgets.semanticModels.SemanticModelsTraining  - Error procesing
> config/LDA directory: Q job unsuccessful.
>
> > On 28 Apr 2015, at 10:32, Dmitriy Lyubimov <dlie...@gmail.com <mailto:
> dlie...@gmail.com>> wrote:
> >
> > if your run time gets too high, try to start with low -k (like 10 or
> > something) and -q=0, that will significantly reduce complexity of the
> > problem.
> >
> > if this works, you need to find optimal levers that suit your
> > hardware/input size/ runtime requirements. ( I can tell you right away
> that
> > (k+p) value influences single task runtime according to power law). Like
> > something like -k 500 will probably not yield a satisfactory time ever.
> The
> > performance study in Nathan Halko's dissertation  computed first 100
> > singlular values/vectors iirc. i.e about k=100, p=15.
> >
> > getting -q=1 boosts accuracy significantly, so if you can affort it at
> all
> > time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter
> off
> > the default value. Values -q >1 are never practical.
> >
> >
> > -d
> >
> >
> >
> > On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <mihai.dasc...@cs.pub.ro
> <mailto:mihai.dasc...@cs.pub.ro>>
> > wrote:
> >
> >> I’ve created a SWING interface around the invocation, but it is not a
> >> classpath setting as the SVD runs for more than 1h. Afterwards I have
> the
> >> runtime error in the HTTPclient, which is really strange. Also I have a
> lot
> >> of map operations in the console, but no reduce operations are logged.
> >>
> >> Thanks!
> >> Mihai
> >>
> >>> On 28 Apr 2015, at 01:09, lastarsenal <lastarse...@163.com <mailto:
> lastarse...@163.com>> wrote:
> >>>
> >>> What's your run command? I think it is because of your classpath
> setting.
> >>>
> >>>
> >>>
> >>>
> >>> At 2015-04-28 15:25:01, "Mihai Dascalu" <mihai.dasc...@cs.pub.ro
> <mailto:mihai.dasc...@cs.pub.ro>> wrote:
> >>>> Hi!
> >>>>
> >>>>
> >>>> I’ve been experimenting with the SSVDSolver and unfortunately, during
> >> runtime, I encounter this error:
> >>>>
> >>>> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner  -
> >> job_local1958711697_0001
> >>>> java.lang.NoClassDefFoundError:
> org/apache/commons/httpclient/HttpMethod
> >>>>     at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
> >>>> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.commons.httpclient.HttpMethod
> >>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> >>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> >>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>>>     ... 1 more
> >>>>
> >>>> Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
> >> org/apache/commons/httpclient/HttpMethod
> >>>>     at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
> >>>> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.commons.httpclient.HttpMethod
> >>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> >>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> >>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>>>     ... 1 more
> >>>>
> >>>> The actual invocation is:
> >>>>
> >>>> public static void runSSVDOnSparseVectors(String inputPath,
> >>>>                     String outputPath, int rank, int oversampling, int
> >> blocks,
> >>>>                     int reduceTasks, int powerIterations, boolean
> >> halfSigma)
> >>>>                     throws IOException {
> >>>>     Configuration conf = new Configuration();
> >>>>     SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
> >>>>                     inputPath) }, new Path(outputPath), blocks, rank,
> >> oversampling,
> >>>>                     reduceTasks);
> >>>>     solver.setQ(powerIterations);
> >>>>     if (halfSigma) {
> >>>>             solver.setcUHalfSigma(true);
> >>>>             solver.setcVHalfSigma(true);
> >>>>     }
> >>>>     solver.run();
> >>>> }
> >>>>
> >>>> while being invoked with (input.getParent() + “/" +
> >> TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 *
> k,
> >> Math.min(200000, (int) (3 * k * 0.01 *
> >> Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2,
> >> true);
> >>>>
> >>>> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5
> >> from the package archive) on a 48k words X 53k docs matrix.
> >>>>
> >>>> Any ideas? It works fine with the similar variables if I run the job
> in
> >> command line.
> >>>>
> >>>> Also, how should I tweak the input variables?
> >>>>
> >>>>
> >>>> Thanks in advance!
> >>>> Mihai
> >>
> >>
>
>

Reply via email to