Re: Creating vectors from lucene index on EMR via the CLI

2012-12-13 Thread hellen maziku
I think I have no choice but to do that. I am able to ssh to the EMR cluster but how do I run my mahout job? I donot know how to proceed. Also how can i mount my input files? From: Ted Dunning ted.dunn...@gmail.com To: user@mahout.apache.org; hellen maziku

Streaming KMeans Text Clustering Concurrency and Advice

2012-12-13 Thread Brandon Root
This is a question regarding the new KNN library that Ted Dunning and Dan Filimon are working on (as I understand it'll be in Mahout 0.8) so I hope this is the appropriate list for this question instead of mahout-dev. First off, it's great. I was looking for a streaming kmeans library (or writing

Re: Streaming KMeans Text Clustering Concurrency and Advice

2012-12-13 Thread Ted Dunning
On Thu, Dec 13, 2012 at 2:29 PM, Brandon Root brandonr...@gmail.com wrote: This is a question regarding the new KNN library that Ted Dunning and Dan Filimon are working on (as I understand it'll be in Mahout 0.8) so I hope this is the appropriate list for this question instead of mahout-dev.

Re: Streaming KMeans Text Clustering Concurrency and Advice

2012-12-13 Thread Dan Filimon
Hi there! Glad to see someone's using it! :D On Fri, Dec 14, 2012 at 12:29 AM, Brandon Root brandonr...@gmail.com wrote: This is a question regarding the new KNN library that Ted Dunning and Dan Filimon are working on (as I understand it'll be in Mahout 0.8) so I hope this is the appropriate

Re: Streaming KMeans Text Clustering Concurrency and Advice

2012-12-13 Thread Ted Dunning
What Dan says here is correct. The lack of dependence on k in the current code is definitely a problem. The work-around is to set the maxClusters to the point that the log factor should have grown to. That sucks so we should fix the heuristic sizing along the lines that Dan says. There should

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-13 Thread Ted Dunning
If your input files are in S3 then the map-reduce steps that mahout spawns can access them without problems. In order to run Mahout programs, you will need to install mahout. There are command line programs in $MAHOUT_HOME/bin that will do what you need. On Thu, Dec 13, 2012 at 10:58 AM, hellen

Re: Creating vectors from lucene index on EMR via the CLI

2012-12-13 Thread hellen maziku
Everytime I run my job from the hadoop cluster, it complained about the s3 format. Iot the errror file doesnot exist or unkown directory. And when I copied the files to the cluster it was okay. I do not what is wrong From: Ted Dunning ted.dunn...@gmail.com