So, I have an oddball question I have been battling with in the last day or
two.

I have an 8 million document solr index, roughly divided down the middle by
an identifying "product" value, one of two distinct values.  The documents
in both "sides" are very similar, with stored text fields, etc.  I have two
nearly identical request handlers, one for each "side".

When I perform very similar queries on either "side" for random phrases,
requesting 500 rows with highlighting on titles and summaries, I get very
different results.  One "side" consistently returns results in around 1-2
seconds, whereas the other one consistently returns in 6-10 seconds.  I
don't see any reason why it's worse; each run of queries is deliberately
randomized to avoid caches getting in the way.  Each test query returns the
full first 500 in most cases.

My filter query cache configuration looks like:

<filterCache class="solr.FastLRUCache"
                 size="750000"
                 initialSize="10000"
                 autowarmCount="0"/>

(desperately trying to increase it, hoping this would help).  The other
caches are quite small; the use cases the customer is dealing with don't
involve much in the way of paging, just returning a large initial set with
highlighting in the shortest time.

I'm trying to optimize this down so the disparity between the two "halves"
is not so dramatic.  Is there any optimizations or things I should be
looking for to tune?  Is it just the "way it is"?  I've tried to argue to
decrease the return set size, turn off highlighting, etc., but these seem
to be out of the question.  I would at least like some concrete reason why
one filter query would be so relatively out of whack than the other, given
the document ranges are very nearly half (3.8 million vs. 4.0 million in
the slower side).

Any pointers or suggestions would be appreciated.  Thanks in advance.

Neal Ensor
nen...@gmail.com

Reply via email to