unable to find the job id.

2011-10-11 Thread Gaurav
Hello, When i try to kill a process i am unable to find the process id after using the command :hadoop job -list It says no jobs running. i am running the canopy clustering example by typing the following command: mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job --input --output

Re: unable to find the job id.

2011-10-11 Thread Paritosh Ranjan
Looks like there is a flag to control that (run sequential or not). "method" And by default it seems to run on cluster. Still, I think you can try it out to make sure whether its running sequential or on cluster. On 11-10-2011 12:12, Gaurav wrote: Hello, When i try to kill a process i am una

Re: Generic approach to kNN

2011-10-11 Thread Sean Owen
It doesn't sound like you really have or want a recommender problem, then. Really, this is neither a clustering nor recommender problem; it's just using a similarity metric. Ted's right that you do have to encode these things as numeric vectors to use anything in Mahout. A vector can't have values

text classification using mahout and lucene index

2011-10-11 Thread drahman
Hi everyone, I want to use mahout for text classification. Right now I'm reading through some chapters of the book "mahout in action", but some of the code examples aren't working yet. So I thougt, that I ask my question right away: how can I use Mahout for text classification? My problem is abou

Re: text classification using mahout and lucene index

2011-10-11 Thread Ted Dunning
The multi-label problem in Mahout needs to be attacked by building multiple binary models. From there, you can use the examples for NaiveBayes and for logistic regression SGD (see the Mahout in Action book) to get things started. You will need to glue the lucene document vector extraction to the

Re: Generic approach to kNN

2011-10-11 Thread Ted Dunning
On Tue, Oct 11, 2011 at 8:18 AM, Sean Owen wrote: > It doesn't sound like you really have or want a recommender problem, > then. Really, this is neither a clustering nor recommender problem; > it's just using a similarity metric. > Absolutely. > Ted's right that you do have to encode these thi

RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting any recommendations due to NaNs being calculated in the AggregateAndRecommend step. I'm not quite sure what is going on as it seems like this was working as little as two weeks ago (post Sebastian's big change to RecJo

Re: RecommenderJob and NaN

2011-10-11 Thread Sean Owen
Where is the NaN coming up -- what has this value? It should be propagated in some cases but not others. I'm not aware of any changes here. Generally small data sets will have this problem of not being able to compute much of anything useful, so NaN might be right here. But you say it was differen

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: > Where is the NaN coming up -- what has this value? simColumn seems to be the originator in the Aggregate step. For instance, my current breakpoint shows: {309682:0.9566912651062012,42938:0.9566912651062012,309672:NaN} I can also see some in the

Re: RecommenderJob and NaN

2011-10-11 Thread Sean Owen
NaN is added for all user item pairs that already exist in the input, to make them ineligible for recommendation. That's normal - could this be the case? On Oct 11, 2011 7:49 PM, "Grant Ingersoll" wrote: > > On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: > > > Where is the NaN coming up -- what h

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 2:49 PM, Grant Ingersoll wrote: > > On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: > >> Where is the NaN coming up -- what has this value? > > simColumn seems to be the originator in the Aggregate step. For instance, my > current breakpoint shows: > {309682:0.956691265106

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 2:54 PM, Sean Owen wrote: > NaN is added for all user item pairs that already exist in the input, to > make them ineligible for recommendation. That's normal - could this be the > case? Trying to track down. I don't think it is the self case, but not 100% sure. > On Oct 1