Hi,
I installed mahout and solr. 

I created an index from the dictionary.txt using the command below

curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F 
"myfile=@dictionary.txt"

To create the vectors from my index 

I needed the org.apache.mahout.utils.vectors.lucene.Driver class. I 
couldnot locate this class in mahout-core-o.7-job.jar. I could only 
locate it from mahout-examples-0.7-job.jar, so I uploaded the 
mahout-examples-0.7-job.jar on an s3 bucket.

I also uploaded the dictionary index on a separete s3 bucket. I created 
another bucket with two folders to store my dictOut and vectors.

I created a job flow on the CLI 

/elastic-mapreduce --create --alive    --log-uri s3n://mahout-output/logs/  
--name dict_vectorize

I added the step to vectorize my index using the following command
./elastic-mapreduce -j j-2NSJRI6N9EQJ4  --jar 
s3n://mahout-bucket/jars/mahout-examples-0.7-job.jar  --main-class 
org.apache.mahout.utils.vectors.lucene.Driver --arg --dir 
s3n://mahout-input/input1/index/ --arg --field doc1 --arg --dictOut 
s3n://mahout-output/solr-dict-out/dict.txt --arg --output 
s3n://mahout-output/solr-vect-out/vectors


But in the logs I get the following error

2012-12-12 09:37:17,883 ERROR org.apache.mahout.utils.vectors.lucene.Driver 
(main): Exception
org.apache.commons.cli2.OptionException: Missing value(s) --dir
    at 
org.apache.commons.cli2.option.ArgumentImpl.validate(ArgumentImpl.java:241)
    at org.apache.commons.cli2.option.ParentImpl.validate(ParentImpl.java:124)
    at 
org.apache.commons.cli2.option.DefaultOption.validate(DefaultOption.java:176)
    at org.apache.commons.cli2.option.GroupImpl.validate(GroupImpl.java:265)
    at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:104)
    at
 org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)


What am I doing wrong?
Another question: what is the correct value of the --field argument, is it doc1 
(the id) or dictionary(from the filename dictionary.txt). I am asking 
this becasue when I issue the querry with q=doc1 on solr I get no 
results. But when I issue the query with q=dictionary, I see my content.

Thank you so much for help. I am a newbie, so please excuse my being too verbal.

Reply via email to