Re: Large queries

2005-10-16 Thread jian chen
Hi, Trond, By the way, it appears to me that Lucene uses the iterator pattern a lot, like SegmentTermEnum, TermDocs, TermPositions, etc. Each iterator uses the underlying fix sized buffer to load a chunck of data at a time. So, even you have millions of documents, you shouldn't run into memory pro

Re: Large queries

2005-10-16 Thread jian chen
Hi, Trond, It should be no problem for Lucene to handle 6 million documents. For your query, it seems you want to do a disjunctive (or'ed) query for multiple terms, 10 terms or 1 terms for example. The worst case I can think of is, you can very easily write your own query class to handle this

Large queries

2005-10-16 Thread Trond Aksel Myklebust
How is Lucene handling very large queries? I have 6million documents, which each has a "docID" field. There is a total of 2 distinct docID's, so many documents got the same docID which consists of a filename (only name, not path). Sometimes, I must get all documents that has one of 10 docID's,

Lucene/Digester

2005-10-16 Thread Malcolm Clark
Hi all, I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm trying to obtain advice but it's hard to say whether the problem is Lucene or Digester. Firstly: I am trying to index the INEX collection but when I try to index repetitive elements only the last one is indexed. F

RE: delete unnecessary files after optimize()

2005-10-16 Thread Koji Sekiguchi
> I've never used Lucene on windows, but if I recall correctly from past > discussions on this topic, the IndexWriter will try to delete any file > listed in deletable whenever it does any segment merging (ie: after adding > some number of documents, when you call .optimize(), or when you call > .c