I got the same problem with k=100 & p=15, aBlockRows=200000 faster now (around 
20minutes)

I just realized that it’s at a final step in the processing (I’ve attached the 
end part of the log)

Any suggestions? In my Eclipse project I have imported:
httpclient-4.2.5.jar
mahout-hdfs-0.10.0.jar
mahout-integration-0.10.0.jar
mahout-math-0.10.0.jar
mahout-mr-0.10.0-job.jar
mahout-mr-0.10.0.jar

The strange part is that it works ok if I run it directly in the terminal.


Thanks!
Mihai

1415149 [Thread-13] INFO org.apache.hadoop.mapred.LocalJobRunner  - map task 
executor complete.
1415151 [Thread-13] DEBUG 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  - Merging data from 
DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox 
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000;
 isDirectory=true; modification_time=1430245778000; access_time=0; owner=; 
group=; permission=rwxrwxrwx; isSymlink=false} to 
file:/Users/mihaidascalu/Dropbox 
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job
1415151 [Thread-13] DEBUG 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  - Merging data from 
DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox 
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000/part-m-00000.deflate;
 isDirectory=false; length=8; replication=1; blocksize=33554432; 
modification_time=1430247134000; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false} to file:/Users/mihaidascalu/Dropbox 
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/part-m-00000.deflate
1415157 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner  - 
job_local1889167692_0001
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.httpclient.HttpMethod
        at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 1 more
1415159 [Thread-13] DEBUG org.apache.hadoop.security.UserGroupInformation  - 
PrivilegedAction as:mihaidascalu (auth:SIMPLE) 
from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
Exception in thread "Thread-13" java.lang.NoClassDefFoundError: 
org/apache/commons/httpclient/HttpMethod
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.httpclient.HttpMethod
        at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 1 more
1415709 [SwingWorker-pool-1-thread-1] DEBUG 
org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction 
as:mihaidascalu (auth:SIMPLE) 
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] DEBUG 
org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction 
as:mihaidascalu (auth:SIMPLE) 
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] DEBUG 
org.apache.hadoop.security.UserGroupInformation  - PrivilegedAction 
as:mihaidascalu (auth:SIMPLE) 
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] ERROR 
view.widgets.semanticModels.SemanticModelsTraining  - Error procesing 
config/LDA directory: Q job unsuccessful.

> On 28 Apr 2015, at 10:32, Dmitriy Lyubimov <dlie...@gmail.com 
> <mailto:dlie...@gmail.com>> wrote:
> 
> if your run time gets too high, try to start with low -k (like 10 or
> something) and -q=0, that will significantly reduce complexity of the
> problem.
> 
> if this works, you need to find optimal levers that suit your
> hardware/input size/ runtime requirements. ( I can tell you right away that
> (k+p) value influences single task runtime according to power law). Like
> something like -k 500 will probably not yield a satisfactory time ever. The
> performance study in Nathan Halko's dissertation  computed first 100
> singlular values/vectors iirc. i.e about k=100, p=15.
> 
> getting -q=1 boosts accuracy significantly, so if you can affort it at all
> time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter off
> the default value. Values -q >1 are never practical.
> 
> 
> -d
> 
> 
> 
> On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <mihai.dasc...@cs.pub.ro 
> <mailto:mihai.dasc...@cs.pub.ro>>
> wrote:
> 
>> I’ve created a SWING interface around the invocation, but it is not a
>> classpath setting as the SVD runs for more than 1h. Afterwards I have the
>> runtime error in the HTTPclient, which is really strange. Also I have a lot
>> of map operations in the console, but no reduce operations are logged.
>> 
>> Thanks!
>> Mihai
>> 
>>> On 28 Apr 2015, at 01:09, lastarsenal <lastarse...@163.com 
>>> <mailto:lastarse...@163.com>> wrote:
>>> 
>>> What's your run command? I think it is because of your classpath setting.
>>> 
>>> 
>>> 
>>> 
>>> At 2015-04-28 15:25:01, "Mihai Dascalu" <mihai.dasc...@cs.pub.ro 
>>> <mailto:mihai.dasc...@cs.pub.ro>> wrote:
>>>> Hi!
>>>> 
>>>> 
>>>> I’ve been experimenting with the SSVDSolver and unfortunately, during
>> runtime, I encounter this error:
>>>> 
>>>> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner  -
>> job_local1958711697_0001
>>>> java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
>>>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
>>>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.httpclient.HttpMethod
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     ... 1 more
>>>> 
>>>> Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
>> org/apache/commons/httpclient/HttpMethod
>>>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
>>>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.httpclient.HttpMethod
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     ... 1 more
>>>> 
>>>> The actual invocation is:
>>>> 
>>>> public static void runSSVDOnSparseVectors(String inputPath,
>>>>                     String outputPath, int rank, int oversampling, int
>> blocks,
>>>>                     int reduceTasks, int powerIterations, boolean
>> halfSigma)
>>>>                     throws IOException {
>>>>     Configuration conf = new Configuration();
>>>>     SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
>>>>                     inputPath) }, new Path(outputPath), blocks, rank,
>> oversampling,
>>>>                     reduceTasks);
>>>>     solver.setQ(powerIterations);
>>>>     if (halfSigma) {
>>>>             solver.setcUHalfSigma(true);
>>>>             solver.setcVHalfSigma(true);
>>>>     }
>>>>     solver.run();
>>>> }
>>>> 
>>>> while being invoked with (input.getParent() + “/" +
>> TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 * k,
>> Math.min(200000, (int) (3 * k * 0.01 *
>> Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2,
>> true);
>>>> 
>>>> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5
>> from the package archive) on a 48k words X 53k docs matrix.
>>>> 
>>>> Any ideas? It works fine with the similar variables if I run the job in
>> command line.
>>>> 
>>>> Also, how should I tweak the input variables?
>>>> 
>>>> 
>>>> Thanks in advance!
>>>> Mihai
>> 
>> 

Reply via email to