Document retrieval, performance, and DocValues

2016-07-05 Thread Randy Tidd
My Lucene index has about 3 million documents and result sets can be large, often 1000’s and sometimes as many as 100,000. I am expecting the index size to grow 5-10x as the system matures. I index 5 fields, and per recommendations I’ve read, am storing the minimal data in Lucene, currently ju

Re: Document retrieval, performance, and DocValues

2016-07-05 Thread Sanne Grinovero
Hi Randy, a first quick and easy win would be to rewrite it as: DocumentStoredFieldVisitor visitor = new DocumentStoredFieldVisitor(Collections.singleton("pos_id”)); for(int i=0; i wrote: > My Lucene index has about 3 million documents and result sets can be large, > often 1000’s and sometimes a

Re: Document retrieval, performance, and DocValues

2016-07-07 Thread Michael McCandless
You should do the MultiDocValues.getBinaryDocValues(indexReader, "pos_id") once up front, not per hit. You could operate per-segment instead by making a custom Collector. Are you sorting by your pos_id field? If so, the value is already available in each FieldDoc and you don't need to separately