RE: LSA Implementation

Norskog, Lance Tue, 27 Nov 2007 12:22:15 -0800

WordNet itself is English-only. There are various ontology projects for
it.


http://www.globalwordnet.org/ is a separate world language database
project. I found it at the bottom of the WordNet wikipedia page. Thanks
for starting me on the search!

Lance 

-----Original Message-----
From: Eswar K [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 26, 2007 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: LSA Implementation

The languages also include CJK :) among others.

- Eswar

On Nov 27, 2007 8:16 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:

> The WordNet project at Princeton (USA) is a large database of
synonyms.
> If you're only working in English this might be useful instead of 
> running your own analyses.
>
> http://en.wikipedia.org/wiki/WordNet
> http://wordnet.princeton.edu/
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 6:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> In addition to recording which keywords a document contains, the 
> method examines the document collection as a whole, to see which other

> documents contain some of those same words. this algo should consider 
> documents that have many words in common to be semantically close, and

> ones with few words in common to be semantically distant. This simple 
> method correlates surprisingly well with how a human being, looking at

> content, might classify a document collection. Although the algorithm 
> doesn't understand anything about what the words *mean*, the patterns 
> it notices can make it seem astonishingly intelligent.
>
> When you search an such  an index, the search engine looks at 
> similarity values it has calculated for every content word, and 
> returns the documents that it thinks best fit the query. Because two 
> documents may be semantically very close even if they do not share a 
> particular keyword,
>
> Where a plain keyword search will fail if there is no exact match, 
> this algo will often return relevant documents that don't contain the 
> keyword at all.
>
> - Eswar
>
> On Nov 27, 2007 7:51 AM, Marvin Humphrey <[EMAIL PROTECTED]>
wrote:
>
> >
> > On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> >
> > > We essentially are looking at having an implementation for doing 
> > > search which can return documents having conceptually similar 
> > > words without necessarily having the original word searched for.
> >
> > Very challenging.  Say someone searches for "LSA" and hits an 
> > archived
>
> > version of the mail you sent to this list.  "LSA" is a reasonably 
> > discriminating term.  But so is "Eswar".
> >
> > If you knew that the original term was "LSA", then you might look 
> > for documents near it in term vector space.  But if you don't know 
> > the original term, only the content of the document, how do you know

> > whether you should look for docs near "lsa" or "eswar"?
> >
> > Marvin Humphrey
> > Rectangular Research
> > http://www.rectangular.com/
> >
> >
> >
>

RE: LSA Implementation

Reply via email to