RE: Catalog backend for document stored fields?

2006-11-14 Thread Robichaud, Jean-Philippe
backend for document stored fields? I'm indexing logs from a transaction-based application. ... millions documents per month, the size of the indices is ~35 gigs per month (that's the lower bound). I have no choice but to 'store' each field values (as well as indexing/tokenizing them) because

Re: Catalog backend for document stored fields?

2006-10-24 Thread Doron Cohen
I'm indexing logs from a transaction-based application. ... millions documents per month, the size of the indices is ~35 gigs per month (that's the lower bound). I have no choice but to 'store' each field values (as well as indexing/tokenizing them) because I'll need to retrieve them in

RE: Catalog backend for document stored fields?

2006-10-23 Thread Robichaud, Jean-Philippe
: Friday, October 20, 2006 5:00 PM To: java-user@lucene.apache.org Subject: Re: Catalog backend for document stored fields? On 10/20/06, Robichaud, Jean-Philippe [EMAIL PROTECTED] wrote: 3- Any ideas on how else I could do this? I'm fully open to discussion! How about not storing

Catalog backend for document stored fields?

2006-10-20 Thread Robichaud, Jean-Philippe
Hello to all of you! I'm using Lucene to index millions a relatively small documents. In fact, I'm indexing logs from a transaction-based application. Each document represents what happened inside during 'transaction'. Each of them is composed by 5-6 main 'states' which are themselves

Re: Catalog backend for document stored fields?

2006-10-20 Thread eks dev
1- is there someone out there that already wrote an extension to Lucene so that 'stored' string for each document/field is in fact stored in a centralized repository? Meaning, only an 'index' is actually stored in the document and the real data is put somewhere else. 2- If not, how

Re: Catalog backend for document stored fields?

2006-10-20 Thread Mike Klaas
On 10/20/06, Robichaud, Jean-Philippe [EMAIL PROTECTED] wrote: 3- Any ideas on how else I could do this? I'm fully open to discussion! How about not storing the fields at all, but storing term vectors, and reconstructing the data from termpositions + terminfo? -Mike