pl unsubscribe me
On 12/28/09, Subscriptions <sub.scripti...@metaheuristica.com> wrote: > > I am trying to write a query analyzer to pull: > > > > 1. Common phrases (also known as Collocations) with in a query > > > > 2. Highly unusual phrases (also known as Statistically Improbable > Phrases or SIPs) with in a query > > > > The Collocations would be similar to facets except I am also trying to get > multi word phrases as well as single terms. So suppose I could write > something that does a chained query off the facet query looking for words > in > proximity. Conceptually (as I understand it) this should just be a question > of using the IDF (inverse document frequency i.e. the measure of how often > the term appears across the index). > > > > * Has anyone tried to write an analyzer that looks for the words > that typically occur within a given proximity of another word? > > > > The highly unusual phrases on the other hand requires getting a handle on > the IDF which at present only appears to be available via the explain > function of debugging. > > > > * Has anyone written something to go directly after the IDF score > only? > > > > * If I do have to go down the path of writing this from scratch is > the org.apache.lucene.search.Similarity class the one to leverage? > > > > Most grateful for any feedback or insights, > > > > Christopher > >