Re: RecommenderJob and NaN

Grant Ingersoll Thu, 13 Oct 2011 20:00:12 -0700

Looks like it is me.  Still not sure why, but getting there.

On Oct 13, 2011, at 10:35 PM, Grant Ingersoll wrote:


> Note, the next version (13df29e4fe97b4370f24d7e91ab5909de76f0f3b) doesn't 
> work.  Debugging.  
> 
> 
> 
> On Oct 13, 2011, at 9:31 PM, Grant Ingersoll wrote:
> 
>> OK, I can confirm that an earlier version 
>> (54300025dbdd6e688a4eb3d043016eb641067c7e in github/lucidimagination/mahout) 
>> worked.  Now, to figure out why.
>> 
>> -Grant
>> 
>> On Oct 13, 2011, at 4:01 AM, Sebastian Schelter wrote:
>> 
>>> Grant,
>>> 
>>> Can you share a little more details about the results, do you get any
>>> exceptions? Or do you just get no results?
>>> 
>>> Using the NaNs inside the similarity matrix vectors has been included in
>>> the job for a very long time and should not cause any problems. As Sean
>>> already mentioned we have unit tests with toy data that should catch the
>>> very obvious errors in this code.
>>> 
>>> Can you share the dataset? I can do a testrun on my research cluster.
>>> 
>>> --sebastian
>>> 
>>> On 13.10.2011 08:37, Sean Owen wrote:
>>>> RecommenderJob? The unit tests run it all the time.
>>>> There should not be any glitches with static variables -- don't think
>>>> there are any.
>>>> 
>>>> On Thu, Oct 13, 2011 at 7:33 AM, Lance Norskog <goks...@gmail.com> wrote:
>>>>> Is this job working well for anyone now?
>>>>> When was the last time this job worked for someone?
>>>>> 
>>>>> On Wed, Oct 12, 2011 at 11:30 AM, Grant Ingersoll 
>>>>> <gsing...@apache.org>wrote:
>>>>> 
>>>>>> Both local and on EC2
>>>>>> 
>>>>>> On Oct 12, 2011, at 2:10 PM, Ken Krugler wrote:
>>>>>> 
>>>>>>> Hi Grant,
>>>>>>> 
>>>>>>> Just curious, are you running this locally or distributed?
>>>>>>> 
>>>>>>> I'd run into a similar issue, though in a completely different algorithm
>>>>>> (Jimmy Lin's PageRank implementation) due to the use of a static 
>>>>>> variable.
>>>>>>> 
>>>>>>> When running locally, this wasn't getting cleared between loops, and 
>>>>>>> thus
>>>>>> I got wonky results.
>>>>>>> 
>>>>>>> The same thing would have happened with JVM reuse enabled.
>>>>>>> 
>>>>>>> -- Ken
>>>>>>> 
>>>>>>> On Oct 12, 2011, at 3:28pm, Grant Ingersoll wrote:
>>>>>>> 
>>>>>>>> Digging some more:
>>>>>>>> 
>>>>>>>> In AggregateAndRecommend, around lines 143, I have, for userId 0, a
>>>>>> simColumn of:
>>>>>>>> 
>>>>>> {22966:0.9566912651062012,81901:0.9566912651062012,263375:0.9566912651062012,263374:0.9566912651062012,263376:NaN}
>>>>>>>> 
>>>>>>>> Which then becomes the numerator and the denom.
>>>>>>>> 
>>>>>>>> Looping, my next simCol is:
>>>>>>>> 
>>>>>> {22966:0.9566912651062012,81901:0.9566912651062012,263375:NaN,263374:0.9566912651062012,263376:0.9566912651062012}
>>>>>>>> 
>>>>>>>> and then
>>>>>>>> 
>>>>>> {22966:0.9566912651062012,81901:0.9566912651062012,263375:0.9566912651062012,263374:NaN,263376:0.9566912651062012}
>>>>>>>> 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> Each time, those are getting added into the numerators/denoms value,
>>>>>> such that by the time we are done looping (line 161), we have:
>>>>>>>> numerators: {22966:NaN,81901:NaN,263376:NaN,263375:NaN,263374:NaN}
>>>>>>>> denoms: {22966:NaN,81901:NaN,263376:NaN,263375:NaN,263374:NaN}
>>>>>>>> 
>>>>>>>> numberOfSimilarItemsUsed:
>>>>>> {81901:5.0,22966:5.0,263376:5.0,263375:5.0,263374:5.0}
>>>>>>>> 
>>>>>>>> Not sure on how to interpret this as I haven't dug into the math here
>>>>>> yet or figured out where those NaN are coming from originally.
>>>>>>>> 
>>>>>>>> On Oct 11, 2011, at 2:55 PM, Grant Ingersoll wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 11, 2011, at 2:49 PM, Grant Ingersoll wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Oct 11, 2011, at 12:36 PM, Sean Owen wrote:
>>>>>>>>>> 
>>>>>>>>>>> Where is the NaN coming up -- what has this value?
>>>>>>>>>> 
>>>>>>>>>> simColumn seems to be the originator in the Aggregate step.  For
>>>>>> instance, my current breakpoint shows:
>>>>>>>>>> {309682:0.9566912651062012,42938:0.9566912651062012,309672:NaN}
>>>>>>>>>> 
>>>>>>>>>> I can also see some in the PartialMultiplyMapper via the
>>>>>> similarityMatrixColumn.
>>>>>>>>>> 
>>>>>>>>>> Is that set by SimilarityMatrixRowWrapperMapper?
>>>>>>>>>> <code>
>>>>>>>>>> /* remove self similarity */
>>>>>>>>>> similarityMatrixRow.set(key.get(), Double.NaN);
>>>>>>>>>> </code>
>>>>>>>>> 
>>>>>>>>> Ah, but that is just taking care of itself, so maybe not the issue.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> It should be propagated in some cases but not others. I'm not aware
>>>>>> of
>>>>>>>>>>> any changes here.
>>>>>>>>>> 
>>>>>>>>>> yeah, me neither.  This is all related to MAHOUT-798.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Generally small data sets will have this problem of not being able 
>>>>>>>>>>> to
>>>>>>>>>>> compute much of anything useful, so NaN might be right here.
>>>>>>>>>>> But you say it was different recently, which seems to rule that out.
>>>>>>>>>> 
>>>>>>>>>> I also _believe_ I'm seeing it in a much larger data set on Hadoop,
>>>>>> it's just that's a whole lot harder to debug.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Oct 11, 2011 at 5:34 PM, Grant Ingersoll <
>>>>>> gsing...@apache.org> wrote:
>>>>>>>>>>>> I'm running trunk RecommenderJob (via build-asf-email.sh) and am 
>>>>>>>>>>>> not
>>>>>> getting any recommendations due to NaNs being calculated in the
>>>>>> AggregateAndRecommend step.  I'm not quite sure what is going on as it 
>>>>>> seems
>>>>>> like this was working as little as two weeks ago (post Sebastian's big
>>>>>> change to RecJob), but I don't see a whole lot of changes in that part of
>>>>>> the code.
>>>>>>>>>>>> 
>>>>>>>>>>>> The data is user id's mapping to email thread ids.  My input data 
>>>>>>>>>>>> is
>>>>>> simply a triple of user id, thread id, 1 (meaning that user participated 
>>>>>> in
>>>>>> that thread)  It seems like I will have a lot of good values in the 
>>>>>> inputs
>>>>>> to the AggregateAndRecommend step, except one id will be NaN and this 
>>>>>> then
>>>>>> seems to get added in and makes everything NaN (I realize this is a very
>>>>>> naive understanding).  I sense that I should be looking upstream in the
>>>>>> process for a fix, but I am not sure where that is.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas where I should be looking to eliminate these NaNs?  If 
>>>>>>>>>>>> you
>>>>>> want to try this with a small data set, you can get it here:
>>>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout(but
>>>>>>  note the companion article is not published yet.)
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Grant
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --------------------------------------------
>>>>>>>>> Grant Ingersoll
>>>>>>>>> http://www.lucidimagination.com
>>>>>>>>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --------------------------------------------
>>>>>>>> Grant Ingersoll
>>>>>>>> http://www.lucidimagination.com
>>>>>>>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>>>>>>>> 
>>>>>>> 
>>>>>>> --------------------------
>>>>>>> Ken Krugler
>>>>>>> +1 530-210-6378
>>>>>>> http://bixolabs.com
>>>>>>> custom big data solutions & training
>>>>>>> Hadoop, Cascading, Mahout & Solr
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --------------------------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com
>>>>>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Lance Norskog
>>>>> goks...@gmail.com
>>>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> Lucene Eurocon 2011: http://www.lucene-eurocon.com
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com

Re: RecommenderJob and NaN

Reply via email to