I think my comment mostly addressed his comments. Yes, this is the
definition of cosine distance, and is implemented. No it doesn't work over
true binary data. There is no "0", only "1" or non-existent.

What is the remaining question?

On Tue, Apr 26, 2011 at 3:21 AM, Chris Waggoner <[email protected]>wrote:
>
>
> I've never used Mahout but what this @allclaws wants sounds like a simple
> proposition. Given a vector like
>
> bought
> didn't buy
> didn't buy
> didn't buy
> didn't buy
> didn't buy
> didn't buy
> bought
> didn't buy
> bought
> bought
> bought
>
>
>
> define "bought" == 1 and "didn't buy" == 0. Define distance between two
> such vectors to be { A dot B } over { |A| times |B| }. Not that I find this
> compelling as a definition of similarity but @allclaws called this a first,
> rough pass.

Reply via email to