Hi, Trond,
By the way, it appears to me that Lucene uses the iterator pattern a lot,
like SegmentTermEnum, TermDocs, TermPositions, etc. Each iterator uses the
underlying fix sized buffer to load a chunck of data at a time. So, even you
have millions of documents, you shouldn't run into memory pro
Hi, Trond,
It should be no problem for Lucene to handle 6 million documents.
For your query, it seems you want to do a disjunctive (or'ed) query for
multiple terms, 10 terms or 1 terms for example. The worst case I can
think of is, you can very easily write your own query class to handle this
How is Lucene handling very large queries? I have 6million documents, which
each has a "docID" field. There is a total of 2 distinct docID's, so
many documents got the same docID which consists of a filename (only name,
not path).
Sometimes, I must get all documents that has one of 10 docID's,
Hi all,
I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm
trying to obtain advice but it's hard to say whether the problem is Lucene or
Digester.
Firstly:
I am trying to index the INEX collection but when I try to index repetitive
elements only the last one is indexed. F
> I've never used Lucene on windows, but if I recall correctly from past
> discussions on this topic, the IndexWriter will try to delete any file
> listed in deletable whenever it does any segment merging (ie: after adding
> some number of documents, when you call .optimize(), or when you call
> .c