Re: RecommenderJob and NaN

Sean Owen Tue, 11 Oct 2011 09:37:01 -0700

Where is the NaN coming up -- what has this value?
It should be propagated in some cases but not others. I'm not aware of
any changes here.


Generally small data sets will have this problem of not being able to
compute much of anything useful, so NaN might be right here.
But you say it was different recently, which seems to rule that out.

On Tue, Oct 11, 2011 at 5:34 PM, Grant Ingersoll <gsing...@apache.org> wrote:
> I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting 
> any recommendations due to NaNs being calculated in the AggregateAndRecommend 
> step.  I'm not quite sure what is going on as it seems like this was working 
> as little as two weeks ago (post Sebastian's big change to RecJob), but I 
> don't see a whole lot of changes in that part of the code.
>
> The data is user id's mapping to email thread ids.  My input data is simply a 
> triple of user id, thread id, 1 (meaning that user participated in that 
> thread)  It seems like I will have a lot of good values in the inputs to the 
> AggregateAndRecommend step, except one id will be NaN and this then seems to 
> get added in and makes everything NaN (I realize this is a very naive 
> understanding).  I sense that I should be looking upstream in the process for 
> a fix, but I am not sure where that is.
>
> Any ideas where I should be looking to eliminate these NaNs?  If you want to 
> try this with a small data set, you can get it here: 
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout 
> (but note the companion article is not published yet.)
>
> Thanks,
> Grant

Re: RecommenderJob and NaN

Reply via email to