subject:"Phrase Frequency For Analysis"

Re: Phrase Frequency For Analysis

2006-06-22 Thread Bob Carpenter

Adding to this growing thread, there's really no reason to index all the term bigrams, trigrams, etc. It's not only slow, it's very memory/disk intensive. All you need to do is two passes over the collection. Pass One Collect counts of bigrams (or trigrams, or whatever -- if size is an

Re: Phrase Frequency For Analysis

2006-06-22 Thread Andrzej Bialecki

Nader Akhnoukh wrote: Yes, Chris is correct, the goal is to determine the most frequently occuring phrases in a document compared to the frequency of that phrase in the index. So there are only output phrases, no inputs. Also performance is not really an issue, this would take place on an irre

Re: Phrase Frequency For Analysis

2006-06-22 Thread Kamal Abou Mikhael

I may be coming into this thread without knowing enough. I have implemented a phrase filter, which indexes all token sequences that are 2 to N tokens long. The n is defined in the constructor. It takes a stopword Trie for input because the policy I used, based on a publish work I read, was that a

Re: Phrase Frequency For Analysis

2006-06-22 Thread Nader Akhnoukh

Yes, Chris is correct, the goal is to determine the most frequently occuring phrases in a document compared to the frequency of that phrase in the index. So there are only output phrases, no inputs. Also performance is not really an issue, this would take place on an irregular basis and could ru

Re: Phrase Frequency For Analysis

2006-06-22 Thread Andrzej Bialecki

Chris Hostetter wrote: I think either you missunderstood Nader's question or I did: I belive the goal is to determine what the most frequently occuring phrases are -- not determine how frequently a particular input phrase appears. Isn't the latter a pre-requisite for the former ? ;) Regardi

Re: Phrase Frequency For Analysis

2006-06-22 Thread Chris Hostetter

: > I am trying to get the most frequently occurring phrases in a document and : > in the index as a whole. The goal is compare the two to get something like : > Amazon's SIPs. : Other than indexing the phrases directly, you could use a SpanNearQuery : over the words, use getSpans() on its SpanS

Re: Phrase Frequency For Analysis

2006-06-22 Thread Paul Elschot

On Thursday 22 June 2006 01:33, Nader Akhnoukh wrote: > Hi, I've looked through the archives and it looks like this question has > been asked in one form or another a few times, but without a satisfactory > solution. > > I am trying to get the most frequently occurring phrases in a document and >

Phrase Frequency For Analysis

2006-06-21 Thread Nader Akhnoukh

Hi, I've looked through the archives and it looks like this question has been asked in one form or another a few times, but without a satisfactory solution. I am trying to get the most frequently occurring phrases in a document and in the index as a whole. The goal is compare the two to get some

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Re: Phrase Frequency For Analysis

Phrase Frequency For Analysis

8 matches

Site Navigation

Mail list logo

Footer information