It is a problem -- but should be are. IDs are hashed to 31-bit
integers, so the probability of collision is small. However you don't
have to have too many items before it's probable that some two have
collided. (IIRC, that's about 2 ^ (31/2) ? )
In practice it doesn't hurt much. It just means
It is necessary. We want to support input where IDs are possibly
64-bit longs, for consistency with the non-distributed code.
But, 64-bit values are too large to be used as indexes into a Vector.
So they are hashed and then un-hashed by a dictionary lookup.
On Tue, Sep 20, 2011 at 11:44 AM, 张玉东
Thanks, I understand. I am not familiar with the algorithms of non-distributed
method.
-邮件原件-
发件人: Sean Owen [mailto:sro...@gmail.com]
发送时间: 2011年9月20日 18:46
收件人: user@mahout.apache.org
主题: Re: why use the job 'itemIDIndex' to convert the itemid to index?
It is necessary. We want to