Not sure what u r trying, but if you r not seeing any compilation errors then u r definitely missing the lucene jars during runtime execution.
On Wed, Jun 4, 2014 at 1:16 AM, Terry Blankers <te...@amritanet.com> wrote: > Hi Suneel, can you please provide a little more detail since I still can't > get this to work. > > Which classpath are the Lucene jars supposed to be added to? my java > project? or the Hadoop instance? > > Thanks, > > Terry > > > On 6/3/14, 5:35 PM, Terry Blankers wrote: > >> Thanks Suneel. I thought having the jar as a dependency and the class >> imported was enough. >> >> >> On 6/3/14, 4:18 PM, Suneel Marthi wrote: >> >>> You r missing the Lucene jars from ur classpath. Mahout's presently at >>> Lucene 4.6.1 that's what u should be including. >>> >>> >>> >>> On Tuesday, June 3, 2014 3:40 PM, Terry Blankers <te...@amritanet.com> >>> wrote: >>> >>> >>> Hello, can anyone please give me a clue as to what I may be missing here? >>> >>> I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner >>> from a java project and I'm getting the following exception: >>> >>> Error: java.lang.ClassNotFoundException: >>> org.apache.lucene.analysis.standard.StandardAnalyzer >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at >>> org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup( >>> SequenceFileTokenizerMapper.java:62) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs( >>> UserGroupInformation.java:1491) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) >>> >>> >>> I've tried adding the location of lucene-analyzers-common-4.6.1.jar to >>> my hadoop classpath which doesn't make any difference. >>> >>> >>> I'm running against Hadoop 2.2 and Mahout trunk, compiled with: >>> >>> mvn clean install -Dhadoop2.version=2.2.0 -DskipTests >>> >>> >>> I'm trying to run the job like this: >>> >>> String[] args = {"--input","/input/index" >>> ,"--output","/output/vectors" >>> ,"--maxNGramSize","3" >>> ,"--namedVector", "--overwrite" >>> }; >>> SparseVectorsFromSequenceFiles sparse = new >>> SparseVectorsFromSequenceFiles(); >>> ToolRunner.run(configuration, sparse, args); >>> >>> >>> Running seq2sparse from the commandline works successfully with no >>> exceptions: >>> >>> $MAHOUT_HOME/bin/mahout seq2sparse -i /input/index --namedVector -o >>> /output/vectors -ow --maxNGramSize 3 >>> >>> >>> Many thanks, >>> >>> Terry >>> >> >> >