Re: LSA Implementation

Marvin Humphrey Mon, 26 Nov 2007 19:42:11 -0800


On Nov 26, 2007, at 6:34 PM, Eswar K wrote:

Although the algorithm doesn't understand anything
about what the words *mean*, the patterns it notices can make it seem
astonishingly intelligent.
When you search an such an index, the search engine looks atsimilarityvalues it has calculated for every content word, and returns thedocumentsthat it thinks best fit the query. Because two documents may besemantically
very close even if they do not share a particular keyword,
Where a plain keyword search will fail if there is no exact match,this algowill often return relevant documents that don't contain the keywordat all.

Perhaps I should have been less curt. I've read a few papers on LSA,so I'm familiar at least in passing with everything you describeabove. It would be entertaining to write an implementation, and I'veconsidered it... but it's a low priority while the patent's in force.

A full term-vector space calculation is... expensive :) ... so LSAperforms reduction. Tuning the algorithm for a threshold effect notjust against "n words in common" but against a rough approximation of"n words in common" is presumably non-trivial.

If you can either find or write open source software that pulls offsuch "astonishingly intelligent" matches despite the many challenges,kudos. I'd love to see it.


Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Re: LSA Implementation

Reply via email to