Not sure what u r trying, but if you r not seeing any compilation errors
then u r definitely missing the lucene jars during runtime execution.


On Wed, Jun 4, 2014 at 1:16 AM, Terry Blankers <te...@amritanet.com> wrote:

> Hi Suneel, can you please provide a little more detail since I still can't
> get this to work.
>
> Which classpath are the Lucene jars supposed to be added to? my java
> project? or the Hadoop instance?
>
> Thanks,
>
> Terry
>
>
> On 6/3/14, 5:35 PM, Terry Blankers wrote:
>
>> Thanks Suneel. I thought having the jar as a dependency and the class
>> imported was enough.
>>
>>
>> On 6/3/14, 4:18 PM, Suneel Marthi wrote:
>>
>>> You r missing the Lucene jars from ur classpath. Mahout's presently at
>>> Lucene 4.6.1 that's what u should be including.
>>>
>>>
>>>
>>> On Tuesday, June 3, 2014 3:40 PM, Terry Blankers <te...@amritanet.com>
>>> wrote:
>>>
>>>
>>> Hello, can anyone please give me a clue as to what I may be missing here?
>>>
>>> I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner
>>> from a java project and I'm getting the following exception:
>>>
>>> Error: java.lang.ClassNotFoundException:
>>> org.apache.lucene.analysis.standard.StandardAnalyzer
>>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>       at
>>> org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(
>>> SequenceFileTokenizerMapper.java:62)
>>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>>>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at javax.security.auth.Subject.doAs(Subject.java:415)
>>>       at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1491)
>>>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>>
>>>
>>> I've tried adding the location of lucene-analyzers-common-4.6.1.jar to
>>> my hadoop classpath which doesn't make any difference.
>>>
>>>
>>> I'm running against Hadoop 2.2 and Mahout trunk, compiled with:
>>>
>>>       mvn clean install  -Dhadoop2.version=2.2.0 -DskipTests
>>>
>>>
>>> I'm trying to run the job like this:
>>>
>>>       String[] args = {"--input","/input/index"
>>>               ,"--output","/output/vectors"
>>>               ,"--maxNGramSize","3"
>>>               ,"--namedVector", "--overwrite"
>>>       };
>>>       SparseVectorsFromSequenceFiles sparse = new
>>> SparseVectorsFromSequenceFiles();
>>>       ToolRunner.run(configuration, sparse, args);
>>>
>>>
>>> Running seq2sparse from the commandline works successfully with no
>>> exceptions:
>>>
>>>       $MAHOUT_HOME/bin/mahout seq2sparse -i /input/index --namedVector -o
>>> /output/vectors -ow --maxNGramSize 3
>>>
>>>
>>> Many thanks,
>>>
>>> Terry
>>>
>>
>>
>

Reply via email to