Re: Custom analyzers for seq2sparse

Camilo Lopez Thu, 21 Apr 2011 10:42:55 -0700

OK that did work for mahout thanks!, but now hadoop cannot load the class, even 
when
the jar containing it has been added to the hadoop classpath


hadoop@ubuntu:/home/camilo/mahout-distribution-0.4$ echo $HADOOP_CLASSPATH 
/home/camilo/mahout-distribution-0.4/utils/target/dependency/lucene-core-3.0.2.jar:/home/camilo/mahout-distribution-0.4/utils/target/dependency/lucene-analyzers-3.0.2.jar:/home/hadoop/my_analyzer.jar


I get:

hadoop@ubuntu:/home/camilo/mahout-distribution-0.4$ bin/mahout seq2sparse  -i 
/htmless_articles_seq -o /htmless_articles_vectors_2 -wt tfidf -a 
com.my.analyzers.MyAnalyzer
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
No HADOOP_CONF_DIR set, using /usr/local/hadoop/conf
11/04/21 13:39:33 WARN driver.MahoutDriver: No seq2sparse.props found on 
classpath, will use command-line arguments only
11/04/21 13:39:33 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum 
n-gram size is: 3
11/04/21 13:39:33 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR 
value: 1.0
11/04/21 13:39:33 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of 
reduce tasks: 1
11/04/21 13:39:33 INFO common.HadoopUtil: Deleting /htmless_articles_vectors_2
11/04/21 13:39:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
11/04/21 13:39:33 INFO input.FileInputFormat: Total input paths to process : 1
11/04/21 13:39:33 INFO mapred.JobClient: Running job: job_201104211109_0038
11/04/21 13:39:34 INFO mapred.JobClient:  map 0% reduce 0%
11/04/21 13:39:43 INFO mapred.JobClient: Task Id : 
attempt_201104211109_0038_m_000000_0, Status : FAILED
java.lang.IllegalStateException: java.lang.ClassNotFoundException: 
com.my.analyzers.MyAnalyzer
        at 
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.my.analyzers.MyAnalyzer
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at 
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:57)
        ... 4 more


Is there anything I'm missing there?
 
On 2011-04-20, at 1:32 PM, Ian Helmke wrote:

> Yes, if you make a subclass of StandardAnalyzer or your own Analyzer
> that has a constructor with no arguments (presumably which calls a
> superclass constructor with the arguments you want), that should work
> nicely. (You could also just add a zero-argument constructor to your
> own custom analyzer.)
> 
> On Wed, Apr 20, 2011 at 1:25 PM, Camilo Lopez <cam...@camilolopez.com> wrote:
>> Ian,
>> 
>> Using 3.0.x ( the one that comes by default in Mahouts trunk now),
>> by nullary consstructor you mean I should overload the constructor to receive
>> no args in my own custom class?
>> 
>> 
>> On 2011-04-20, at 1:23 PM, Ian Helmke wrote:
>> 
>>> What version of lucene are you using? If you use lucene 3.0 or later,
>>> you can't use StandardAnalyzer as-is because it has no no-args
>>> constructor. You could try the mahout DefaultAnalyzer (which wraps the
>>> lucene analyzer in a no-argument constructor). I have gotten custom
>>> analyzers to work, but they need to have a nullary constructor.
>>> 
>>> 
>>> On Wed, Apr 20, 2011 at 12:58 PM, Camilo Lopez <cam...@camilolopez.com> 
>>> wrote:
>>>> Hi List,
>>>> 
>>>> Trying to run custom analizer classes I'm always getting 
>>>> InstantiationException, at first I suspected my own code, but trying with 
>>>> what is supposed to be the default value 
>>>> 'org.apache.lucene.analysis.standard.StandardAnalyzer' I still get the 
>>>> same exception.
>>>> 
>>>> This is the command
>>>> 
>>>> bin/mahout seq2sparse  -i /htmless_articles_seq -o 
>>>> /htmless_articles_vectors_1 -ng 3 -x35 -wt tfidf -a 
>>>> org.apache.lucene.analysis.standard.StandardAnalyzer  -nv
>>>> 
>>>> 
>>>> Looking a little deeper (ie catching the InstantiationException and 
>>>> throwing getCause())  InstantiationException in turns out the problem is 
>>>> caused by a NullPointerException
>>>> 
>>>> Exception in thread "main" java.lang.NullPointerException
>>>>        at 
>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:211)
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>        at 
>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:52)
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>        at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>        at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>        at 
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>        at 
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>        at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>        at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>> 
>>>> 
>>>> Am I missing something, is there another way to create/use custom 
>>>> analyzers in seq2sparse?
>>>> 
>>>> 
>>>> 
>> 
>>

Re: Custom analyzers for seq2sparse

Reply via email to