Re: unable to find the job id.

2011-10-11 Thread Paritosh Ranjan
Looks like there is a flag to control that (run sequential or not). method And by default it seems to run on cluster. Still, I think you can try it out to make sure whether its running sequential or on cluster. On 11-10-2011 12:12, Gaurav wrote: Hello, When i try to kill a process i am

Re: Generic approach to kNN

2011-10-11 Thread Sean Owen
It doesn't sound like you really have or want a recommender problem, then. Really, this is neither a clustering nor recommender problem; it's just using a similarity metric. Ted's right that you do have to encode these things as numeric vectors to use anything in Mahout. A vector can't have

text classification using mahout and lucene index

2011-10-11 Thread drahman
Hi everyone, I want to use mahout for text classification. Right now I'm reading through some chapters of the book mahout in action, but some of the code examples aren't working yet. So I thougt, that I ask my question right away: how can I use Mahout for text classification? My problem is about

Re: text classification using mahout and lucene index

2011-10-11 Thread Ted Dunning
The multi-label problem in Mahout needs to be attacked by building multiple binary models. From there, you can use the examples for NaiveBayes and for logistic regression SGD (see the Mahout in Action book) to get things started. You will need to glue the lucene document vector extraction to the

Re: Generic approach to kNN

2011-10-11 Thread Ted Dunning
On Tue, Oct 11, 2011 at 8:18 AM, Sean Owen sro...@gmail.com wrote: It doesn't sound like you really have or want a recommender problem, then. Really, this is neither a clustering nor recommender problem; it's just using a similarity metric. Absolutely. Ted's right that you do have to

RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting any recommendations due to NaNs being calculated in the AggregateAndRecommend step. I'm not quite sure what is going on as it seems like this was working as little as two weeks ago (post Sebastian's big change to

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: Where is the NaN coming up -- what has this value? simColumn seems to be the originator in the Aggregate step. For instance, my current breakpoint shows: {309682:0.9566912651062012,42938:0.9566912651062012,309672:NaN} I can also see some in the

Re: RecommenderJob and NaN

2011-10-11 Thread Sean Owen
NaN is added for all user item pairs that already exist in the input, to make them ineligible for recommendation. That's normal - could this be the case? On Oct 11, 2011 7:49 PM, Grant Ingersoll gsing...@apache.org wrote: On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: Where is the NaN coming

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 2:49 PM, Grant Ingersoll wrote: On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: Where is the NaN coming up -- what has this value? simColumn seems to be the originator in the Aggregate step. For instance, my current breakpoint shows: