Could be garbage collection. Could be larger and larger merges. At some point
your commit will cause all segments to be merged. It's likely that what's
happening is you need to hit the "magic combination" of events, particularly
the problem of too many warming searchers.

So, look at your log files or the admin page and see what your searcher
warmup times are. This provides a lower bound for your commit interval.

I'm guessing you have a single machine that's indexing and searching.
Consider a master/slave setup which will avoid the problem of
indexing and search contention. As you say you're going to handle
many more queries in the future this may be required anyway...

NRT does not just search doc IDs, it's intended for this kind of problem
so I believe that is a possibility. But we're talking trunk here I think.

I _strongly_ encourage you to think about whether such rapid search
availability is really required. Often 3-5 minutes is acceptable if you
ask, which gives you ample time to avoid this problem. That said,
you have a relatively small index here, so you may be able to get
away with, say, 30 second commits.

Best
Erick

On Thu, Mar 29, 2012 at 4:49 AM, Rafal Gwizdala
<rafal.gwizd...@gmail.com> wrote:
> That's bad news.
> If 5-7 seconds is not safe then what is the safe interval for updates?
> Near real-time is not for me as it works only when querying by document Id
> - this doesn't solve anything in my case. I just want the index to be
> updated in real-time, 30-40 seconds delay is acceptable but not much more
> than that. Is there anything that can be done, or should I start looking
> for some other indexing tool?
> I'm wondering why there's such terrible performance degradation over time -
> SOLR runs fine for first 10-20 hours, updates are extremely fast and then
> they become slower and slower until eventually they stop executing at all.
> Is there any issue with garbage collection or index fragmentation or some
> internal data structures that can't manage their data effectively when
> updates are frequent?
>
> Best regards
> RG
>
>
>  Thu, Mar 29, 2012 at 10:24 AM, Lance Norskog <goks...@gmail.com> wrote:
>
>> 5-7 seconds- there's the problem. If you want to have documents
>> visible for search within that time, you want to use the trunk and
>> "near-real-time" search. A hard commit does several hard writes to the
>> disk (with the fsync() system call). It does not run smoothly at that
>> rate. It is no surprise that eventually you hit a thread-locking bug.
>>
>>
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet
>>
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin
>>
>> On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
>> <rafal.gwizd...@gmail.com> wrote:
>> > Lance, I know there are many variables that's why I'm asking where to
>> start
>> > and what to check.
>> > Updates are sent every 5-7 seconds, each update contains between 1 and 50
>> > docs. Commit is done every time (on each update).
>> > Currently queries aren't very frequent - about 1 query every 3-5 seconds,
>> > but the system is going to handle much more (of course if the problem is
>> > fixed).
>> > The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
>> > 300 MB)
>> >
>> > R
>> >
>> > On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog <goks...@gmail.com>
>> wrote:
>> >
>> >> How often are updates? And when are commits? How many CPUs? How much
>> >> query load? There are so many variables.
>> >>
>> >> Check the mailing list archives and Solr issues, there might be a
>> >> similar problem already discussed. Also, attachments do not work with
>> >> Apache mailing lists. (Well, ok, they work for direct subscribers, but
>> >> not for indirect subscribers and archive site users.)
>> >>
>> >> --
>> >> Lance Norskog
>> >> goks...@gmail.com
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>

Reply via email to