Re: range queries on string field with millions of values

Naomi Dushay Fri, 28 Nov 2008 16:41:52 -0800

Gosh,  I'm sorry to be so unclear.  Hmm.  Trying to clarify below:


On Nov 28, 2008, at 3:52 PM, Chris Hostetter wrote:

Having read through this thread, i'm not sure i understand whatexactly
the problem is.  my naive understanding is...

1) you want to sort by a field
2) you want to be able to "paginate" through all docs in order of this
field.
3) you want to be able to start your pagination at any arbitraryvalue for
this field.
so (assuming the field is a simple number for now) you could ussomething
like

  q=yourField:[42 TO *&sort=yourField+asc&rows=10&start-0

where "42" is the arbitrary ID someone wants to start at.


perfect.  This is the query I'm using.

The results are correct.  But the response time sucks.

Reading the docs about caches, I thought I could populate the queryresult cache with an autowarming query and the response time would beokay. But that hasn't worked. (See excerpts from my solrConfig filebelow.)

A repeated query is very fast, implying caching happens for aparticular starting point ("42" above).

Is there a way to populate the cache with the ENTIRE sorted list ofvalues for the field, so any arbitrary starting point will get resultsfrom the cache, rather than grabbing all results from (x) to the end,then sorting all these results, then returning the first 10?

This sentence below seems to imply that you have a solution whichproduces
correct results, but doesn't produce results quickly...


right.

: I have a performance problem and I haven't thought of a clever wayaround it.
...however this lines seems to suggest that you're having trouble
getting at least 10 results from any query (?)
: Call numbers are squirrelly, so we can't predict the string thatwill: appropriately grab at least 10 subsequent documents. They arecertainly not
: consecutive!
:
: so from
: A123 B34 1970
:
: we're unable to predict if any of these will return at least 10results:


I was trying to express that I couldn't do this:

myfield:[X TO Y]

because I can't algorithmically compute Y.

Glen Newton suggested a work around, whereby I represent mysquirrelly, but sortable, field values as floating point numbers, andthen I can compute Y.

...but i'm not sure what exactly that means. for any given field,there
is always going to be some values X such that myField:[X TO *] won't
return at least 10 docs ... the are the last values in the index inorder-- surely it's okay for your app to have an "end" state when you runoutof data? :)


yes.  Understood.  This is not an issue.

Oh, and BTW...

: numbers in sort order".  I have also mucked about with the cache
: initialization, but that's not working either:
:
: <listener event="firstSearcher"class="solr.QuerySenderListener">
...make sure you also do a newSearcher listener that does the samething,
otherwise your FieldCache (used for sorting) may not be warmed when
commits happen)


Yup yup yup.

from solrconfig:

    <filterCache
      class="solr.LRUCache"
      size="20000000"
      initialSize="10000000"
      autowarmCount="500000"/>

    <queryResultCache
      class="solr.LRUCache"
      size="10000000"
      initialSize="5000000"
      autowarmCount="5000000"/>


    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <!-- populate query result cache for sorted queries -->
        <lst>
                <str name="q">shelfkey:[0 TO *]</str>
                <str name="sort">shelfkey asc</str>
        </lst>
      </arr>
    </listener>

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <!-- populate query result cache for sorted queries -->
        <lst>
                <str name="q">shelfkey:[0 TO *]</str>
                <str name="sort">shelfkey asc</str>
        </lst>

Re: range queries on string field with millions of values

Reply via email to