Re: Using IDF to find Collactions and SIPs . . ?

Siddhartha Pahade Sun, 03 Jan 2010 04:41:47 -0800

pl unsubscribe me


On 12/28/09, Subscriptions <sub.scripti...@metaheuristica.com> wrote:
>
> I am trying to write a query analyzer to pull:
>
>
>
> 1.      Common phrases (also known as Collocations) with in a query
>
>
>
> 2.      Highly unusual phrases (also known as Statistically Improbable
> Phrases or SIPs) with in a query
>
>
>
> The Collocations would be similar to facets except I am also trying to get
> multi word phrases as well as single terms. So suppose I could write
> something that does a chained query off the facet query looking for words
> in
> proximity. Conceptually (as I understand it) this should just be a question
> of using the IDF (inverse document frequency i.e. the measure of how often
> the term appears across the index).
>
>
>
> *         Has anyone tried to write an analyzer that looks for the words
> that typically occur within a given proximity of another word?
>
>
>
> The highly unusual phrases on the other hand requires getting a handle on
> the IDF which at present only appears to be available via the explain
> function of debugging.
>
>
>
> *         Has anyone written something to go directly after the IDF score
> only?
>
>
>
> *         If I do have to go down the path of writing this from scratch is
> the org.apache.lucene.search.Similarity class the one to leverage?
>
>
>
> Most grateful for any feedback or insights,
>
>
>
> Christopher
>
>

Re: Using IDF to find Collactions and SIPs . . ?

Reply via email to