I think my comment mostly addressed his comments. Yes, this is the definition of cosine distance, and is implemented. No it doesn't work over true binary data. There is no "0", only "1" or non-existent.
What is the remaining question? On Tue, Apr 26, 2011 at 3:21 AM, Chris Waggoner <[email protected]>wrote: > > > I've never used Mahout but what this @allclaws wants sounds like a simple > proposition. Given a vector like > > bought > didn't buy > didn't buy > didn't buy > didn't buy > didn't buy > didn't buy > bought > didn't buy > bought > bought > bought > > > > define "bought" == 1 and "didn't buy" == 0. Define distance between two > such vectors to be { A dot B } over { |A| times |B| }. Not that I find this > compelling as a definition of similarity but @allclaws called this a first, > rough pass.
