Re: IndexDocValues and storing Stats

2012-01-04 Thread Hany Azzam
Hi Simon, Thank you for your reply. The document length is just an example of what I need to store. Another stat that I need is a *normalised* sum of the TF's. I can compute this using my own cache during retrieval by extending the SimilarityBase and storing the values in a cache that is used w

Re: IndexDocValues and storing Stats

2012-01-04 Thread Simon Willnauer
Hey, On Wed, Jan 4, 2012 at 1:15 PM, Hany Azzam wrote: > Hi, > > I am experimenting with the Lucene trunk (aka 4.0), especially with the new > IndexDocValues feature. I am trying to store some query-independent > statistics such as PageRank, etc. One stat that I am trying to store is the > sum

Re: IndexDocValues and storing Stats

2012-01-04 Thread Hany Azzam
Hi, I am experimenting with the Lucene trunk (aka 4.0), especially with the new IndexDocValues feature. I am trying to store some query-independent statistics such as PageRank, etc. One stat that I am trying to store is the sum of all the term frequencies in a document. This can be seen as the

Inheritance heirarchy in the contrib-queryparser package

2012-01-04 Thread 1983-01-06
Hi folks, I was recommended to use PrecedenceQueryParser if I want boolean precedence in my queries. While examining this class, I have noticed that it and its super class do not extend the QueryParser but have a separate implementation/hierarchy. All other parsers in that package do extend the

Re: Tagging documents as they are indexed -- Is FST a reasonable approach?

2012-01-04 Thread Julien Nioche
Hi Ryan, Why not preprocessing your documents with tools like Apache UIMA, GATE or OpenNLP before indexing them in Lucene? GATE for instance has FST-based gazetteers which would be perfect for your place names, AFAIK there is also a Dictionary component for UIMA which would be a good match. Julie