Perhaps the tricky part here is that Solr makes it's caches for #parts# of
the query. In other words, a query that sorts on field A will populate
the cache for field A. Any other query that sorts on field A will use the
same cache. So you really need just enough queries to populate, in this
case, the fields you'll sort by. One could put together multiple sorts on a
single query and populate the sort caches all at once if you wanted.

Similarly for faceting and filter queries. You might well be able to make
just a few queries that filled up all the relevant caches rather than the
using 100s, but you know your schema way better than I do.

What I meant about replicating work is that trying to use your after
hook to fire off the queries probably doesn't buy you anything
over firstSearcher/newSearcher lists.

All that said, though, if you really don't want to put your queries in
the config file, it would be relatively trivial to write a small Java app
that uses SolrJ to query the server, reading the queries from
anyplace you chose and call it from the after hook. Personally, I
think this is a high-cost option when compared to having the list
in the config file due to the added complexity, but that's your
call.

Best
Erick

On Wed, Dec 8, 2010 at 12:25 PM, Mark <static.void....@gmail.com> wrote:

> We only replicate twice an hour so we are far from real-time indexing. Our
> application never writes to master rather we just pick up all changes using
> updated_at timestamps when delta-importing using DIH.
>
> We don't have any warming queries in firstSearcher/newSearcher event
> listeners. My initial post was asking how I would go about doing this with a
> large number of queries. Our queries themselves tend to have a lot of
> faceting and other restrictions on them so I would rather not list them all
> out using xml. I was hoping there was some sort of log replayer handler or
> class that would replay a bunch of queries while the node is offline. When
> its done, it will bring the node back online ready to serve requests.
>
>
> On 12/8/10 6:15 AM, Jonathan Rochkind wrote:
>
>> How often do you replicate? Do you know how long your warming queries take
>> to complete?
>>
>> As others in this thread have mentioned, if your replications (or ordinary
>> commits, if you weren't using replication) happen quicker than warming takes
>> to complete, you can get overlapping indexes being warmed up, and run out of
>> RAM (causing garbage collection to take lots of CPU, if not an out-of-memory
>> error), or otherwise block on CPU with lots of new indexes being warmed at
>> once.
>>
>> Solr is not very good at providing 'real time indexing' for this reason,
>> although I believe there are some features in post-1.4 trunk meant to
>> support 'near real time search' better.
>> ________________________________________
>> From: Mark [static.void....@gmail.com]
>> Sent: Tuesday, December 07, 2010 10:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Warming searchers/Caching
>>
>> Maybe I should explain my problem a little more in detail.
>>
>> The problem we are experiencing is after a delta-import we notice a
>> extremely high load time on the slave machines that just replicated. It
>> goes away after a min or so production traffic once everything is cached.
>>
>> I already have a before/after hook that is in place before/after
>> replication takes place. The before hook removes the slave from the
>> cluster and then starts to replicate. When its done it calls the after
>> hook and I would like to warm up the cache in this method so no users
>> experience extremely long wait times.
>>
>> On 12/7/10 4:22 PM, Markus Jelsma wrote:
>>
>>> XInclude works fine but that's not what your looking for i guess. Having
>>> the
>>> 100 top queries is overkill anyway and it can take too long for a new
>>> searcher
>>> to warmup.
>>>
>>> Depending on the type of requests, i usually tend to limit warming to
>>> popular
>>> filter queries only as they generate a very high hit ratio at make
>>> caching
>>> useful [1].
>>>
>>> If there are very popular user entered queries having a high initial
>>> latency,
>>> i'd have them warmed up as well.
>>>
>>> [1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs
>>>
>>>  Warning: I haven't used this personally, but Xinclude looks like what
>>>> you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude
>>>>
>>>>
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Tue, Dec 7, 2010 at 6:33 PM, Mark<static.void....@gmail.com>
>>>> wrote:
>>>>
>>>>> Is there any plugin or easy way to auto-warm/cache a new searcher with
>>>>> a
>>>>> bunch of searches read from a file? I know this can be accomplished
>>>>> using
>>>>> the EventListeners (newSearcher, firstSearcher) but I rather not add
>>>>> 100+
>>>>> queries to my solrconfig.xml.
>>>>>
>>>>> If there is no hook/listener available, is there some sort of Handler
>>>>> that performs this sort of function? Thanks!
>>>>>
>>>>

Reply via email to