On Sat, Dec 15, 2012 at 12:04 PM, S L <sol.leder...@gmail.com> wrote:
> Thanks everyone for the responses.
>
> I did some more queries and watched disk activity with iostat. Sure enough,
> during some of the slow queries the disk was pegged at 100% (or more.)
>
> The requirement for the app I'm building is to be able to retrieve 500
> results in ideally one second. The index has 3 million records. Most
> searches do bring back 500 results, or close to that many.
>
> This is my schema:
>
>    <field name="metaDataUrl" type="string" indexed="true" stored="true"
> required="true"/>
>    <field name="title" type="text" stored="true" indexed="true"/>
>    <field name="snippet" type="text" indexed="true" stored="true"/>
>    <field name="rest" type="string" stored="true" indexed="false"
> multiValued="true"/>
>    <field name="date_indexed" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
>    <field name="all" type="text" stored="false" indexed="true"
> multiValued="true"/>
>
> These are my copyField values:
>
>    <copyField source="title" dest="all"/>
>    <copyField source="snippet" dest="all"/>
>
> The "rest" field is an array of several fairly small strings.
>
> I've set size and initialSize to 100000 for queryResultCache and
> documentCache. I've got the tomcat configured to use 4 GB of memory.

You might try turning *down* the size of your caches and JVM heap to
give more memory to the OS for caching the index files.
What's the size of the index, and how much memory do you have on the machine?

Some other things you could try:
 - try current nightly build of 4x - 4.1 has compressed stored fields
by default, meaning more of the index will fit into the OS cache.
 - more RAM (or find other ways to make the index smaller) if the
index size is just slightly larger than your free RAM
 - use an SSD for the index
 - reframe the problem so you don't need to bring back 500 documents
per request (most normal search applications don't actually display
500 results to the user at once... so you must be doing something
interesting)
 - in the future, when DocValues are supported (4.2 probably), if you
only need to retrieve certain smaller fields, you will be able to put
all those field values into a single column/file (hence much better
caching by the OS).


> Yonik, can you please explain what you mean by this?
>
>> For scalability, we don't read all of the stored fields for the whole
>> list into memory before starting to send the results, but stream them
>> back instead.
>
> How do you implement this streaming?

while there are more document ids in the list:
 - get next document id (it's an integer)
 - load the stored fields
 - send back the stored fields for that document


-Yonik
http://lucidworks.com

Reply via email to