Re: Creating vectors from lucene index on EMR via the CLI

Ted Dunning Wed, 12 Dec 2012 08:54:25 -0800

Yes.  The --alive option is the one that keeps the flow around.

Excuse me for not reading carefully.


On Wed, Dec 12, 2012 at 7:58 AM, hellen maziku <nahe...@yahoo.com> wrote:

> Hi Ted,
> If I am running it as a single step, then how come I can add more steps to
> it. Currently there are 6 steps. Every time I get the errors, I just add
> another step to the same job ID. So I dont understand.
>
> Also the command to create the job flow is
> /elastic-mapreduce --create --alive    --log-uri
> s3n://mahout-output/logs/  --name dict_vectorize
>
>
> doesn't that mean that the keep alive is set?
>
>
>
> ________________________________
>  From: Ted Dunning <ted.dunn...@gmail.com>
> To: user@mahout.apache.org; hellen maziku <nahe...@yahoo.com>
> Sent: Wednesday, December 12, 2012 9:48 AM
> Subject: Re: Creating vectors from lucene index on EMR via the CLI
>
> You are trying to run this job as a single step in an EMR flow.  Mahout's
> command line programs assume that you are running against a live cluster
> that will hang around (since many  mahout steps involve more than one
> map-reduce).
>
> It would probably be best to separate the creation of the cluster (with the
> keep-alive flag set) from the execution of the Mahout jobs with a
> subsequent explicit tear-down of the cluster.
>
> On Wed, Dec 12, 2012 at 3:55 AM, hellen maziku <nahe...@yahoo.com> wrote:
>
> > Hi,
> > I installed mahout and solr.
> >
> > I created an index from the dictionary.txt using the command below
> >
> > curl "
> > http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true";
> -F
> > "myfile=@dictionary.txt"
> >
> > To create the vectors from my index
> >
> > I needed the org.apache.mahout.utils.vectors.lucene.Driver class. I
> > couldnot locate this class in mahout-core-o.7-job.jar. I could only
> > locate it from mahout-examples-0.7-job.jar, so I uploaded the
> > mahout-examples-0.7-job.jar on an s3 bucket.
> >
> > I also uploaded the dictionary index on a separete s3 bucket. I created
> > another bucket with two folders to store my dictOut and vectors.
> >
> > I created a job flow on the CLI
> >
> > /elastic-mapreduce --create --alive    --log-uri
> > s3n://mahout-output/logs/  --name dict_vectorize
> >
> > I added the step to vectorize my index using the following command
> > ./elastic-mapreduce -j j-2NSJRI6N9EQJ4  --jar
> > s3n://mahout-bucket/jars/mahout-examples-0.7-job.jar  --main-class
> > org.apache.mahout.utils.vectors.lucene.Driver --arg --dir
> > s3n://mahout-input/input1/index/ --arg --field doc1 --arg --dictOut
> > s3n://mahout-output/solr-dict-out/dict.txt --arg --output
> > s3n://mahout-output/solr-vect-out/vectors
> >
> >
> > But in the logs I get the following error
> >
> > 2012-12-12 09:37:17,883 ERROR
> > org.apache.mahout.utils.vectors.lucene.Driver (main): Exception
> > org.apache.commons.cli2.OptionException: Missing value(s) --dir
> >     at
> >
> org.apache.commons.cli2.option.ArgumentImpl.validate(ArgumentImpl.java:241)
> >     at
> > org.apache.commons.cli2.option.ParentImpl.validate(ParentImpl.java:124)
> >     at
> >
> org.apache.commons.cli2.option.DefaultOption.validate(DefaultOption.java:176)
> >     at
> > org.apache.commons.cli2.option.GroupImpl.validate(GroupImpl.java:265)
> >     at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:104)
> >     at
> >  org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >     at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
> >
> >
> > What am I doing wrong?
> > Another question: what is the correct value of the --field argument, is
> it
> > doc1 (the id) or dictionary(from the filename dictionary.txt). I am
> asking
> > this becasue when I issue the querry with q=doc1 on solr I get no
> > results. But when I issue the query with q=dictionary, I see my content.
> >
> > Thank you so much for help. I am a newbie, so please excuse my being too
> > verbal.
> >

Re: Creating vectors from lucene index on EMR via the CLI

Reply via email to