Re: Cosine distances to Random Vector basis

2011-04-26 Thread Randall McRee
I've done a new, clean, implementation of this (just the knn piece) at my current company which has agreed to allow an open source contribution. Thanks, Randy On Mon, Apr 25, 2011 at 11:09 PM, Ted Dunning wrote: > Available cheaper at my old company. > > > http://www.deepdyve.com/lp/association

Re: Cosine distances to Random Vector basis

2011-04-25 Thread Ted Dunning
Available cheaper at my old company. http://www.deepdyve.com/lp/association-for-computing-machinery/symbolic-regression-using-nearest-neighbor-indexing-GmDA73L5II On Mon, Apr 25, 2011 at 10:22 PM, Randall McRee wrote: > Symbolic Regression using Nearest Neighbor > Indexing >

Re: Cosine distances to Random Vector basis

2011-04-25 Thread Randall McRee
Charikar is the definitive reference for this method. See [1] Charikar, M., Similarity estimation techniques from rounding*.* In *Proceedings of the Symposium on Theory of Computing*, 2002. I also created a simple LSH NN method based on this idea (refined, I think) which you can find here: Mc

Re: Cosine distances to Random Vector basis

2011-04-24 Thread Ted Dunning
Sounds like a variant of LSH to me. See Wikipedia article on LSH with random projections. On Sun, Apr 24, 2011 at 8:56 PM, Lance Norskog wrote: > I just found this vector distance idea in a technical paper: > > Create a space defined by X random vectors. For you data vectors, > take the cosine

Re: Cosine distances to Random Vector basis

2011-04-24 Thread Jake Mannix
This is the starting point of the way I've always seen people do Locality Sensitive Hashing with floating point vectors. Once you have these bit vectors, you can do minhash stuff on them to complete LSH. On Sun, Apr 24, 2011 at 8:56 PM, Lance Norskog wrote: > I just found this vector distance i

Cosine distances to Random Vector basis

2011-04-24 Thread Lance Norskog
I just found this vector distance idea in a technical paper: Create a space defined by X random vectors. For you data vectors, take the cosine distance to each random vector and save the sign of the value as a bit. This gives a bit set of X bits. There could be another distance and algorithm for