Hello again,

2011/3/14 Markus Jelsma <markus.jel...@openindex.io>

> > Hello,
> >
> > 2011/3/14 Markus Jelsma <markus.jel...@openindex.io>
> >
> > > Hi Doğacan,
> > >
> > > Are you, at some point, running out of heap space? In my experience,
> > > that's the common cause of increased load and excessivly high response
> > > times (or time
> > > outs).
> >
> > How much of a heap size would be enough? Our index size is growing slowly
> > but we did not have this problem
> > a couple weeks ago where index size was maybe 100mb smaller.
>
> Telling how much heap space is needed isn't easy to say. It usually needs
> to
> be increased when you run out of memory and get those nasty OOM errors, are
> you getting them?
> Replication eventes will increase heap usage due to cache warming queries
> and
> autowarming.
>
>
Nope, no OOM errors.


> >
> > We left most of the caches in solrconfig as default and only increased
> > filterCache to 1024. We only ask for "id"s (which
> > are unique) and no other fields during queries (though we do faceting).
> > Btw, 1.6gb of our index is stored fields (we store
> > everything for now, even though we do not get them during queries), and
> > about 1gb of index.
>
> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
> there
> a lot of entries? Is there an insanity count? Do you use boost functions?
>
>
Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?


> It might not have anything to do with memory at all but i'm just asking.
> There
> may be a bug in your revision causing this.
>
> >
> > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> any
> > improvement in load. I can try monitoring with Jconsole
> > with 8gigs of heap to see if it helps.
> >
> > > Cheers,
> > >
> > > > Hello everyone,
> > > >
> > > > First of all here is our Solr setup:
> > > >
> > > > - Solr nightly build 986158
> > > > - Running solr inside the default jetty comes with solr build
> > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
> of
> > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > > > - Size of index is around 2.5gb
> > > > - No incremental writes, index is created from scratch(delete old
> > >
> > > documents
> > >
> > > > -> commit new documents -> optimize)  every 6 hours
> > > > - Avg # of request per second is around 60 (for a single slave)
> > > > - Avg time per request is around 25ms (before having problems)
> > > > - Load on each is slave is around 2
> > > >
> > > > We are using this set-up for months without any problem. However last
> > >
> > > week
> > >
> > > > we started to experience very weird performance problems like :
> > > >
> > > > - Avg time per request increased from 25ms to 200-300ms (even higher
> if
> > >
> > > we
> > >
> > > > don't restart the slaves)
> > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > cpu)
> > > >
> > > > When we profile solr we see two very strange things :
> > > >
> > > > 1 - This is the jconsole output:
> > > >
> > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > >
> > > > As you see gc runs for every 10-15 seconds and collects more than 1
> gb
> > > > of memory. (Actually if you wait more than 10 minutes you see spikes
> > > > up to
> > >
> > > 4gb
> > >
> > > > consistently)
> > > >
> > > > 2 - This is the newrelic output :
> > > >
> > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > >
> > > > As you see solr spent ridiculously long time in
> > > > SolrDispatchFilter.doFilter() method.
> > > >
> > > >
> > > > Apart form these, when we clean the index directory, re-replicate and
> > > > restart  each slave one by one we see a relief in the system but
> after
> > >
> > > some
> > >
> > > > time servers start to melt down again. Although deleting index and
> > > > replicating doesn't solve the problem, we think that these problems
> are
> > > > somehow related to replication. Because symptoms started after
> > >
> > > replication
> > >
> > > > and once it heals itself after replication. I also see
> > > > lucene-write.lock files in slaves (we don't have write.lock files in
> > > > the master) which I think we shouldn't see.
> > > >
> > > >
> > > > If anyone can give any sort of ideas, we will appreciate it.
> > > >
> > > > Regards,
> > > > Dogacan Guney
>



-- 
Doğacan Güney

Reply via email to