On Sat, Dec 15, 2012 at 12:04 PM, S L <sol.leder...@gmail.com> wrote: > Thanks everyone for the responses. > > I did some more queries and watched disk activity with iostat. Sure enough, > during some of the slow queries the disk was pegged at 100% (or more.) > > The requirement for the app I'm building is to be able to retrieve 500 > results in ideally one second. The index has 3 million records. Most > searches do bring back 500 results, or close to that many. > > This is my schema: > > <field name="metaDataUrl" type="string" indexed="true" stored="true" > required="true"/> > <field name="title" type="text" stored="true" indexed="true"/> > <field name="snippet" type="text" indexed="true" stored="true"/> > <field name="rest" type="string" stored="true" indexed="false" > multiValued="true"/> > <field name="date_indexed" type="date" indexed="true" stored="true" > default="NOW" multiValued="false"/> > <field name="all" type="text" stored="false" indexed="true" > multiValued="true"/> > > These are my copyField values: > > <copyField source="title" dest="all"/> > <copyField source="snippet" dest="all"/> > > The "rest" field is an array of several fairly small strings. > > I've set size and initialSize to 100000 for queryResultCache and > documentCache. I've got the tomcat configured to use 4 GB of memory.
You might try turning *down* the size of your caches and JVM heap to give more memory to the OS for caching the index files. What's the size of the index, and how much memory do you have on the machine? Some other things you could try: - try current nightly build of 4x - 4.1 has compressed stored fields by default, meaning more of the index will fit into the OS cache. - more RAM (or find other ways to make the index smaller) if the index size is just slightly larger than your free RAM - use an SSD for the index - reframe the problem so you don't need to bring back 500 documents per request (most normal search applications don't actually display 500 results to the user at once... so you must be doing something interesting) - in the future, when DocValues are supported (4.2 probably), if you only need to retrieve certain smaller fields, you will be able to put all those field values into a single column/file (hence much better caching by the OS). > Yonik, can you please explain what you mean by this? > >> For scalability, we don't read all of the stored fields for the whole >> list into memory before starting to send the results, but stream them >> back instead. > > How do you implement this streaming? while there are more document ids in the list: - get next document id (it's an integer) - load the stored fields - send back the stored fields for that document -Yonik http://lucidworks.com