Re: why use the job 'itemIDIndex' to convert the itemid to index?

2011-09-20 Thread Sean Owen
It is a problem -- but should be are. IDs are hashed to 31-bit integers, so the probability of collision is small. However you don't have to have too many items before it's probable that some two have collided. (IIRC, that's about 2 ^ (31/2) ? ) In practice it doesn't hurt much. It just means

Re: why use the job 'itemIDIndex' to convert the itemid to index?

2011-09-20 Thread Sean Owen
It is necessary. We want to support input where IDs are possibly 64-bit longs, for consistency with the non-distributed code. But, 64-bit values are too large to be used as indexes into a Vector. So they are hashed and then un-hashed by a dictionary lookup. On Tue, Sep 20, 2011 at 11:44 AM, 张玉东

Re: why use the job 'itemIDIndex' to convert the itemid to index?

2011-09-20 Thread 张玉东
Thanks, I understand. I am not familiar with the algorithms of non-distributed method. -邮件原件- 发件人: Sean Owen [mailto:sro...@gmail.com] 发送时间: 2011年9月20日 18:46 收件人: user@mahout.apache.org 主题: Re: why use the job 'itemIDIndex' to convert the itemid to index? It is necessary. We want to