Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread hellen maziku
Hi, I installed mahout and solr. I created an index from the dictionary.txt using the command below curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F "myfile=@dictionary.txt" To create the vectors from my index I needed the org.apache.mahout.utils.vectors.lucen

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Sean Owen
I don't know much about this particular job, but the general problem here is that you are passing arguments to a binary called elastic-mapreduce, and not to the Java program. There is likely some mechanism to package up arguments that need to be sent to the program, as an argument to the elastic-ma

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread hellen maziku
Also, what do you mean by " don't know much about this particular job", does the type of the job jar file matter? I thought as long as I could locate the org.apache.mahout.utils.vectors.lucene.Driver class then I was good to use that job jar file. Btw, whenever I installed and compiled mahout 0

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Sean Owen
I just mean I'm not familiar with the particular code you are running, but, think the problem is to do with calling elastic-mapreduce in general, which has nothing to do with the JAR itself. Indeed there's nothing that indicates a problem with the JAR file. As I said on your other message, I think

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Ted Dunning
You are trying to run this job as a single step in an EMR flow. Mahout's command line programs assume that you are running against a live cluster that will hang around (since many mahout steps involve more than one map-reduce). It would probably be best to separate the creation of the cluster (w

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread hellen maziku
Hi Ted, If I am running it as a single step, then how come I can add more steps to it. Currently there are 6 steps. Every time I get the errors, I just add another step to the same job ID. So I dont understand. Also the command to create the job flow is /elastic-mapreduce --create --alive    --

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Ted Dunning
Yes. The --alive option is the one that keeps the flow around. Excuse me for not reading carefully. On Wed, Dec 12, 2012 at 7:58 AM, hellen maziku wrote: > Hi Ted, > If I am running it as a single step, then how come I can add more steps to > it. Currently there are 6 steps. Every time I get t

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Ted Dunning
I would still recommend that you switch to using the mahout programs directly to submit jobs. Those programs really have an assumption baked in that they will be submitting the jobs themselves. The EMR commands that you are using take responsibility for creating the environment that you need for

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread hellen maziku
Thank you for the advice. But on my machine I do not have hadoop installed. Running the jobs locally with mahout gives me heap size errors as seen from http://en.wikipedia.org/wiki/User:Bloodysnowrocker/Hadoop. I could only do recommendations locally but clustering and creating of vectors wasn't

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-12 Thread Ted Dunning
You can ssh to the EMR cluster if you like. On Wed, Dec 12, 2012 at 9:38 AM, hellen maziku wrote: > Thank you for the advice. But on my machine I do not have hadoop > installed. Running the jobs locally with mahout gives me heap size errors > as seen from http://en.wikipedia.org/wiki/User:Bloody