> Nope, no OOM errors.

That's a good start!

> Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
> functions.
> 
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, maybe the garbage collector takes up a lot of CPU time. Could you check 
your garbage collector log? It must be enabled via some JVM options:

JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -
Xloggc:/var/log/tomcat6/gc.log"

Also, what JVM version are you using and what are your other JVM settings? Are 
Xms and Xmx at the same value? I see you're using the throughput collector. 
You might want to use CMS because it partially runs concurrently (the low-
pause collector) and has less stop-the-world interruptions.

http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html

Again, this may not be the issue ;)

> 
> Btw, our current revision was just a random choice but up until two weeks
> ago it has been rock-solid so we have been
> reluctant to update to another version. Would you recommend upgrading to
> latest trunk?

I don't know what changes have been made since your revision. Please consult 
the CHANGES.txt for that.

> 
> > It might not have anything to do with memory at all but i'm just asking.
> > There
> > may be a bug in your revision causing this.
> > 
> > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> > 
> > any
> > 
> > > improvement in load. I can try monitoring with Jconsole
> > > with 8gigs of heap to see if it helps.
> > > 
> > > > Cheers,
> > > > 
> > > > > Hello everyone,
> > > > > 
> > > > > First of all here is our Solr setup:
> > > > > 
> > > > > - Solr nightly build 986158
> > > > > - Running solr inside the default jetty comes with solr build
> > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > > > > 24gb
> > 
> > of
> > 
> > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > > > > Replication - Size of index is around 2.5gb
> > > > > - No incremental writes, index is created from scratch(delete old
> > > > 
> > > > documents
> > > > 
> > > > > -> commit new documents -> optimize)  every 6 hours
> > > > > - Avg # of request per second is around 60 (for a single slave)
> > > > > - Avg time per request is around 25ms (before having problems)
> > > > > - Load on each is slave is around 2
> > > > > 
> > > > > We are using this set-up for months without any problem. However
> > > > > last
> > > > 
> > > > week
> > > > 
> > > > > we started to experience very weird performance problems like :
> > > > > 
> > > > > - Avg time per request increased from 25ms to 200-300ms (even
> > > > > higher
> > 
> > if
> > 
> > > > we
> > > > 
> > > > > don't restart the slaves)
> > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > > cpu)
> > > > > 
> > > > > When we profile solr we see two very strange things :
> > > > > 
> > > > > 1 - This is the jconsole output:
> > > > > 
> > > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > > > 
> > > > > As you see gc runs for every 10-15 seconds and collects more than 1
> > 
> > gb
> > 
> > > > > of memory. (Actually if you wait more than 10 minutes you see
> > > > > spikes up to
> > > > 
> > > > 4gb
> > > > 
> > > > > consistently)
> > > > > 
> > > > > 2 - This is the newrelic output :
> > > > > 
> > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > > 
> > > > > As you see solr spent ridiculously long time in
> > > > > SolrDispatchFilter.doFilter() method.
> > > > > 
> > > > > 
> > > > > Apart form these, when we clean the index directory, re-replicate
> > > > > and restart  each slave one by one we see a relief in the system
> > > > > but
> > 
> > after
> > 
> > > > some
> > > > 
> > > > > time servers start to melt down again. Although deleting index and
> > > > > replicating doesn't solve the problem, we think that these problems
> > 
> > are
> > 
> > > > > somehow related to replication. Because symptoms started after
> > > > 
> > > > replication
> > > > 
> > > > > and once it heals itself after replication. I also see
> > > > > lucene-write.lock files in slaves (we don't have write.lock files
> > > > > in the master) which I think we shouldn't see.
> > > > > 
> > > > > 
> > > > > If anyone can give any sort of ideas, we will appreciate it.
> > > > > 
> > > > > Regards,
> > > > > Dogacan Guney

Reply via email to