Re: why use the job 'itemIDIndex' to convert the itemid to index?

Sean Owen Tue, 20 Sep 2011 03:37:51 -0700

It is a problem -- but should be are. IDs are hashed to 31-bit
integers, so the probability of collision is small. However you don't
have to have too many items before it's probable that some two have
collided. (IIRC, that's about 2 ^ (31/2) ? )


In practice it doesn't hurt much. It just means that data from two
different items has been mixed up and treated as if it was all from
one item. That's not ideal, but has a tiny overall effect on
recommendations.

Another practical tip: if your item IDs all fit into an unsigned int
already, then the hash function won't mix them up at all as all of
them will hash to themselves.

2011/9/20 张玉东 <zhangyud...@vancl.cn>:
> I am trouble with this problem, if two itemids are mapped to the same index, 
> then how to compute the similarity between them?
>
>
>

Re: why use the job 'itemIDIndex' to convert the itemid to index?

Reply via email to