Re: Flexible Indexing (was Re: Lucene Planning)

2006-06-02 Thread Grant Ingersoll
I thought it was you, but wasn't sure. I would also like a way to store the frequency of the term in the overall collection (probably should go in the Term dictionary, but not sure, at the cost of an additional VInt per term, but I am open to other places to store it). Right now, in order to

Re: Flexible Indexing (was Re: Lucene Planning)

2006-06-02 Thread Marvin Humphrey
On Jun 2, 2006, at 6:48 AM, Grant Ingersoll wrote: I thought it was you, but wasn't sure. I'm always looking for ways to minimize Term Vectors, because I consider excerpting/highlighting a core feature rather than an add- on, and they seem like such overkill. It bothers me that they

Re: Flexible Indexing (was Re: Lucene Planning)

2006-06-01 Thread Marvin Humphrey
On Jun 1, 2006, at 5:48 AM, Grant Ingersoll wrote: Someone on the list a while ago suggested moving Term Vectors out of the postings and storing them separately, as then they don't have to be merged (but they doc ids would have to be kept up to date) Yes, that was me. :) I suggested

Re: Flexible Indexing (was Re: Lucene Planning)

2006-05-31 Thread Marvin Humphrey
[wild brainstorming...] Another reason to consolidate the freqs, positions, and boosts/norms into one file: we can isolate and distill the code that encodes/ decodes that file into a plugin, weakening the current tight coupling between Lucene and its file format. Changing that index format