Hi, I installed mahout and solr. I created an index from the dictionary.txt using the command below
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@dictionary.txt" To create the vectors from my index I needed the org.apache.mahout.utils.vectors.lucene.Driver class. I couldnot locate this class in mahout-core-o.7-job.jar. I could only locate it from mahout-examples-0.7-job.jar, so I uploaded the mahout-examples-0.7-job.jar on an s3 bucket. I also uploaded the dictionary index on a separete s3 bucket. I created another bucket with two folders to store my dictOut and vectors. I created a job flow on the CLI /elastic-mapreduce --create --alive --log-uri s3n://mahout-output/logs/ --name dict_vectorize I added the step to vectorize my index using the following command ./elastic-mapreduce -j j-2NSJRI6N9EQJ4 --jar s3n://mahout-bucket/jars/mahout-examples-0.7-job.jar --main-class org.apache.mahout.utils.vectors.lucene.Driver --arg --dir s3n://mahout-input/input1/index/ --arg --field doc1 --arg --dictOut s3n://mahout-output/solr-dict-out/dict.txt --arg --output s3n://mahout-output/solr-vect-out/vectors But in the logs I get the following error 2012-12-12 09:37:17,883 ERROR org.apache.mahout.utils.vectors.lucene.Driver (main): Exception org.apache.commons.cli2.OptionException: Missing value(s) --dir at org.apache.commons.cli2.option.ArgumentImpl.validate(ArgumentImpl.java:241) at org.apache.commons.cli2.option.ParentImpl.validate(ParentImpl.java:124) at org.apache.commons.cli2.option.DefaultOption.validate(DefaultOption.java:176) at org.apache.commons.cli2.option.GroupImpl.validate(GroupImpl.java:265) at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:104) at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) What am I doing wrong? Another question: what is the correct value of the --field argument, is it doc1 (the id) or dictionary(from the filename dictionary.txt). I am asking this becasue when I issue the querry with q=doc1 on solr I get no results. But when I issue the query with q=dictionary, I see my content. Thank you so much for help. I am a newbie, so please excuse my being too verbal.