Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

Lance Norskog Sun, 09 Jan 2011 20:55:43 -0800

Ok. I was talking about what tools are available now- much better
things are in the NRT work. I don't know how merges work now, in re
multitasking and thread contention. Most of the Solr sites I know of
have much larger indexes than ram and expect everything to work
smoothly.


Lance

On Sun, Jan 9, 2011 at 9:18 AM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
>> The older MergePolicies followed a strategy which is quite disruptive in an 
>> NRT environment.
>
> Can you elaborate as to why (maybe we need to place this in a wiki)?
> If large merges are running in their own thread, they should not
> disrupt queries, eg, there won't be CPU contention.  The IO contention
> can be disruptive, depending on the size and type of hardware, however
> in the ideal case of the index 'fitting' into RAM/IO cache, then a
> large merge should not affect queries (or indexing).
>
> I think what's useful that is being developed for not disrupting NRT
> with merges is DirectIOLinuxDirectory:
> https://issues.apache.org/jira/browse/LUCENE-2500  It's also useful
> for the non-NRT use case because anytime IO cache pages are evicted,
> queries will slow down (unless the index is too large to fit in RAM
> anyways).
>
> On Sat, Jan 8, 2011 at 7:55 PM, Lance Norskog <goks...@gmail.com> wrote:
>> There are always slowdowns when merging new segments during indexing.
>> A MergePolicy decides when to merge segments.  The older MergePolicies
>> followed a strategy which is quite disruptive in an NRT environment.
>>
>> There is a new feature in 3.x & the trunk called
>> 'BalancedSegmentMergePolicy'. This new MergePolicy is designed for the
>> near-real-time use case. It was contributed by LinkedIn. You may find
>> it works well enough for your case.
>>
>> Lance
>>
>> On Thu, Jan 6, 2011 at 10:21 AM, Stephen Boesch <java...@gmail.com> wrote:
>>> Thanks Yonik,
>>>  Using a stable release of Solr what would you suggest to do - given
>>> MultiSearch's demise and the other work is still ongoing?
>>>
>>> 2011/1/6 Yonik Seeley <yo...@lucidimagination.com>
>>>
>>>> On Thu, Jan 6, 2011 at 12:37 PM, Stephen Boesch <java...@gmail.com> wrote:
>>>> > Solr/lucene newbie here ..
>>>> >
>>>> > We would like searches against a solr/lucene index to immediately be able
>>>> to
>>>> > view data that was added.  I stress "small" amount of new data given that
>>>> > any significant amount would require excessive  latency.
>>>>
>>>> There has been significant ongoing work in lucene-core for NRT (near real
>>>> time).
>>>> We need to overhaul Solr's DirectUpdateHandler2 to take advantage of
>>>> all this work.
>>>> Mark Miller took a first crack at it (sharing a single IndexWriter,
>>>> letting lucene handle the concurrency issues, etc)
>>>> but if there's a JIRA issue, I'm having trouble finding it.
>>>>
>>>> > Looking around, i'm wondering if the direction would be a MultiSearcher
>>>> > living on top of our standard directory-based IndexReader as well as a
>>>> > custom Searchable that handles the newest documents - and then combines
>>>> the
>>>> > two results?
>>>>
>>>> If you look at trunk, MultiSearcher has already gone away.
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

Reply via email to