Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock ?
I'm not sure, i haven't seen a similar issue in a sharded environment, probably because it was a controlled environment. > Hello, > > 2011/3/14 Markus Jelsma <markus.jel...@openindex.io> > > > That depends on your GC settings and generation sizes. And, instead of > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > JConsole now shows a different profile output but load is still high and > performance is still bad. > > Btw, here is the thread profile from newrelic: > > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm > > Note that we do use a form of sharding so I maybe all the time spent > waiting for handleRequestBody > is results from sharding? > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > > > It's actually, as I understand it, expected JVM behavior to see the > > > heap rise to close to it's limit before it gets GC'd, that's how Java > > > GC works. Whether that should happen every 20 seconds or what, I > > > don't > > > > nkow. > > > > > Another option is setting better JVM garbage collection arguments, so > > > GC doesn't "stop the world" so often. I have had good luck with my > > > Solr using this: -XX:+UseParallelGC > > > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > > Hello again, > > > > > > > > 2011/3/14 Markus Jelsma<markus.jel...@openindex.io> > > > > > > > >>> Hello, > > > >>> > > > >>> 2011/3/14 Markus Jelsma<markus.jel...@openindex.io> > > > >>> > > > >>>> Hi Doğacan, > > > >>>> > > > >>>> Are you, at some point, running out of heap space? In my > > > >>>> experience, that's the common cause of increased load and > > > >>>> excessivly high > > > > response > > > > > >>>> times (or time > > > >>>> outs). > > > >>> > > > >>> How much of a heap size would be enough? Our index size is growing > > > >>> slowly but we did not have this problem > > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > > >> > > > >> Telling how much heap space is needed isn't easy to say. It usually > > > >> needs to > > > >> be increased when you run out of memory and get those nasty OOM > > > > errors, > > > > > >> are you getting them? > > > >> Replication eventes will increase heap usage due to cache warming > > > >> queries and > > > >> autowarming. > > > > > > > > Nope, no OOM errors. > > > > > > > >>> We left most of the caches in solrconfig as default and only > > > > increased > > > > > >>> filterCache to 1024. We only ask for "id"s (which > > > >>> are unique) and no other fields during queries (though we do > > > > faceting). > > > > > >>> Btw, 1.6gb of our index is stored fields (we store > > > >>> everything for now, even though we do not get them during queries), > > > > and > > > > > >>> about 1gb of index. > > > >> > > > >> Hmm, it seems 4000 would be enough indeed. What about the > > > >> fieldCache, are there > > > >> a lot of entries? Is there an insanity count? Do you use boost > > > >> functions? > > > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > > boosting functions. > > > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > > > goes to 8gb every 20 seconds or so, > > > > gc runs, falls down to 1gb. > > > > > > > > Btw, our current revision was just a random choice but up until two > > > > weeks > > > > > > ago it has been rock-solid so we have been > > > > reluctant to update to another version. Would you recommend upgrading > > > > to > > > > > > latest trunk? > > > > > > > >> It might not have anything to do with memory at all but i'm just > > > > asking. > > > > > >> There > > > >> may be a bug in your revision causing this. > > > >> > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > > > > get > > > > > >> any > > > >> > > > >>> improvement in load. I can try monitoring with Jconsole > > > >>> with 8gigs of heap to see if it helps. > > > >>> > > > >>>> Cheers, > > > >>>> > > > >>>>> Hello everyone, > > > >>>>> > > > >>>>> First of all here is our Solr setup: > > > >>>>> > > > >>>>> - Solr nightly build 986158 > > > >>>>> - Running solr inside the default jetty comes with solr build > > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with > > > > 24gb > > > > > >> of > > > >> > > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr > > > > Replication > > > > > >>>>> - Size of index is around 2.5gb > > > >>>>> - No incremental writes, index is created from scratch(delete old > > > >>>> > > > >>>> documents > > > >>>> > > > >>>>> -> commit new documents -> optimize) every 6 hours > > > >>>>> - Avg # of request per second is around 60 (for a single slave) > > > >>>>> - Avg time per request is around 25ms (before having problems) > > > >>>>> - Load on each is slave is around 2 > > > >>>>> > > > >>>>> We are using this set-up for months without any problem. However > > > > last > > > > > >>>> week > > > >>>> > > > >>>>> we started to experience very weird performance problems like : > > > >>>>> > > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even > > > > higher > > > > > >> if > > > >> > > > >>>> we > > > >>>> > > > >>>>> don't restart the slaves) > > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses > > > >>>>> %400-%600 cpu) > > > >>>>> > > > >>>>> When we profile solr we see two very strange things : > > > >>>>> > > > >>>>> 1 - This is the jconsole output: > > > >>>>> > > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691 > > > >>>>> > > > >>>>> As you see gc runs for every 10-15 seconds and collects more than > > > >>>>> 1 > > > >> > > > >> gb > > > >> > > > >>>>> of memory. (Actually if you wait more than 10 minutes you see > > > > spikes > > > > > >>>>> up to > > > >>>> > > > >>>> 4gb > > > >>>> > > > >>>>> consistently) > > > >>>>> > > > >>>>> 2 - This is the newrelic output : > > > >>>>> > > > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > >>>>> > > > >>>>> As you see solr spent ridiculously long time in > > > >>>>> SolrDispatchFilter.doFilter() method. > > > >>>>> > > > >>>>> > > > >>>>> Apart form these, when we clean the index directory, re-replicate > > > > and > > > > > >>>>> restart each slave one by one we see a relief in the system but > > > >> > > > >> after > > > >> > > > >>>> some > > > >>>> > > > >>>>> time servers start to melt down again. Although deleting index > > > >>>>> and replicating doesn't solve the problem, we think that these > > > >>>>> problems > > > >> > > > >> are > > > >> > > > >>>>> somehow related to replication. Because symptoms started after > > > >>>> > > > >>>> replication > > > >>>> > > > >>>>> and once it heals itself after replication. I also see > > > >>>>> lucene-write.lock files in slaves (we don't have write.lock files > > > > in > > > > > >>>>> the master) which I think we shouldn't see. > > > >>>>> > > > >>>>> > > > >>>>> If anyone can give any sort of ideas, we will appreciate it. > > > >>>>> > > > >>>>> Regards, > > > >>>>> Dogacan Guney