Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system 
suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock 
?

I'm not sure, i haven't seen a similar issue in a sharded environment, 
probably because it was a controlled environment.


> Hello,
> 
> 2011/3/14 Markus Jelsma <markus.jel...@openindex.io>
> 
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> 
> JConsole now shows a different profile output but load is still high and
> performance is still bad.
> 
> Btw, here is the thread profile from newrelic:
> 
> https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> 
> Note that we do use a form of sharding so I maybe all the time spent
> waiting for handleRequestBody
> is results from sharding?
> 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> > > It's actually, as I understand it, expected JVM behavior to see the
> > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > GC works.  Whether that should happen every 20 seconds or what, I
> > > don't
> > 
> > nkow.
> > 
> > > Another option is setting better JVM garbage collection arguments, so
> > > GC doesn't "stop the world" so often. I have had good luck with my
> > > Solr using this:  -XX:+UseParallelGC
> > > 
> > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > Hello again,
> > > > 
> > > > 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > > > 
> > > >>> Hello,
> > > >>> 
> > > >>> 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > > >>> 
> > > >>>> Hi Doğacan,
> > > >>>> 
> > > >>>> Are you, at some point, running out of heap space? In my
> > > >>>> experience, that's the common cause of increased load and
> > > >>>> excessivly high
> > 
> > response
> > 
> > > >>>> times (or time
> > > >>>> outs).
> > > >>> 
> > > >>> How much of a heap size would be enough? Our index size is growing
> > > >>> slowly but we did not have this problem
> > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > >> 
> > > >> Telling how much heap space is needed isn't easy to say. It usually
> > > >> needs to
> > > >> be increased when you run out of memory and get those nasty OOM
> > 
> > errors,
> > 
> > > >> are you getting them?
> > > >> Replication eventes will increase heap usage due to cache warming
> > > >> queries and
> > > >> autowarming.
> > > > 
> > > > Nope, no OOM errors.
> > > > 
> > > >>> We left most of the caches in solrconfig as default and only
> > 
> > increased
> > 
> > > >>> filterCache to 1024. We only ask for "id"s (which
> > > >>> are unique) and no other fields during queries (though we do
> > 
> > faceting).
> > 
> > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > >>> everything for now, even though we do not get them during queries),
> > 
> > and
> > 
> > > >>> about 1gb of index.
> > > >> 
> > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > >> fieldCache, are there
> > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > >> functions?
> > > > 
> > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > boosting functions.
> > > > 
> > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > > goes to 8gb every 20 seconds or so,
> > > > gc runs, falls down to 1gb.
> > > > 
> > > > Btw, our current revision was just a random choice but up until two
> > 
> > weeks
> > 
> > > > ago it has been rock-solid so we have been
> > > > reluctant to update to another version. Would you recommend upgrading
> > 
> > to
> > 
> > > > latest trunk?
> > > > 
> > > >> It might not have anything to do with memory at all but i'm just
> > 
> > asking.
> > 
> > > >> There
> > > >> may be a bug in your revision causing this.
> > > >> 
> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> > 
> > get
> > 
> > > >> any
> > > >> 
> > > >>> improvement in load. I can try monitoring with Jconsole
> > > >>> with 8gigs of heap to see if it helps.
> > > >>> 
> > > >>>> Cheers,
> > > >>>> 
> > > >>>>> Hello everyone,
> > > >>>>> 
> > > >>>>> First of all here is our Solr setup:
> > > >>>>> 
> > > >>>>> - Solr nightly build 986158
> > > >>>>> - Running solr inside the default jetty comes with solr build
> > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > 
> > 24gb
> > 
> > > >> of
> > > >> 
> > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> > 
> > Replication
> > 
> > > >>>>> - Size of index is around 2.5gb
> > > >>>>> - No incremental writes, index is created from scratch(delete old
> > > >>>> 
> > > >>>> documents
> > > >>>> 
> > > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > > >>>>> - Avg time per request is around 25ms (before having problems)
> > > >>>>> - Load on each is slave is around 2
> > > >>>>> 
> > > >>>>> We are using this set-up for months without any problem. However
> > 
> > last
> > 
> > > >>>> week
> > > >>>> 
> > > >>>>> we started to experience very weird performance problems like :
> > > >>>>> 
> > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> > 
> > higher
> > 
> > > >> if
> > > >> 
> > > >>>> we
> > > >>>> 
> > > >>>>> don't restart the slaves)
> > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses
> > > >>>>> %400-%600 cpu)
> > > >>>>> 
> > > >>>>> When we profile solr we see two very strange things :
> > > >>>>> 
> > > >>>>> 1 - This is the jconsole output:
> > > >>>>> 
> > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > > >>>>> 
> > > >>>>> As you see gc runs for every 10-15 seconds and collects more than
> > > >>>>> 1
> > > >> 
> > > >> gb
> > > >> 
> > > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> > 
> > spikes
> > 
> > > >>>>> up to
> > > >>>> 
> > > >>>> 4gb
> > > >>>> 
> > > >>>>> consistently)
> > > >>>>> 
> > > >>>>> 2 - This is the newrelic output :
> > > >>>>> 
> > > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > >>>>> 
> > > >>>>> As you see solr spent ridiculously long time in
> > > >>>>> SolrDispatchFilter.doFilter() method.
> > > >>>>> 
> > > >>>>> 
> > > >>>>> Apart form these, when we clean the index directory, re-replicate
> > 
> > and
> > 
> > > >>>>> restart  each slave one by one we see a relief in the system but
> > > >> 
> > > >> after
> > > >> 
> > > >>>> some
> > > >>>> 
> > > >>>>> time servers start to melt down again. Although deleting index
> > > >>>>> and replicating doesn't solve the problem, we think that these
> > > >>>>> problems
> > > >> 
> > > >> are
> > > >> 
> > > >>>>> somehow related to replication. Because symptoms started after
> > > >>>> 
> > > >>>> replication
> > > >>>> 
> > > >>>>> and once it heals itself after replication. I also see
> > > >>>>> lucene-write.lock files in slaves (we don't have write.lock files
> > 
> > in
> > 
> > > >>>>> the master) which I think we shouldn't see.
> > > >>>>> 
> > > >>>>> 
> > > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > > >>>>> 
> > > >>>>> Regards,
> > > >>>>> Dogacan Guney

Reply via email to