TermsQuery works by pulling the postings lists for each term and OR-ing them together to create a bitset, which is very memory-efficient but means that you don't know at doc collection time which term has actually matched.
For your case you probably want to create a SpanOrQuery, and then iterate through the resulting Spans in a specialised Collector. Depending on how many terms you want, though, you may end up requiring a lot of memory for the search. Alan Woodward www.flax.co.uk On 2 Nov 2015, at 17:14, Upayavira wrote: > I have a scenario where I want to search for documents that contain many > terms (maybe 100s or 1000s), and then know the number of terms that > matched. I'm happy to implement this as a query object/parser. > > I understand that Lucene isn't well suited to this scenario. Any > suggestions as to how to make this more efficient? Does the TermsQuery > work differently from the BooleanQuery regarding large numbers of terms? > > Upayavira