One more point and I'll stop - I've hit my email quota for the day ;) While its a pain to have to juggle GC params and tune - when you require a heap thats more than a gig or two, I personally believe its essential to do so for good performance. The (default settings / ergonomics with throughput) just don't cut it. Sad fact of life :) Luckily, you don't generally have to do that much to get things nice - the number of options is not that staggering, and you don't usually need to get into most of them. Choosing the right collector, and tweaking a setting or two can often be enough.
The most important to do with a large heap and the throughput collector is to turn on parallel tenured collection. I've said it before, but it really is key. At least if you have more than a processor or two - which, for your sake, I hope you do :) - Mark Mark Miller wrote: > Thats a good point too - if you can reduce your need for such a large > heap, by all means, do so. > > However, considering you already need at least 10GB or you get OOM, you > have a long way to go with that approach. Good luck :) > > How many docs do you have ? I'm guessing its mostly FieldCache type > stuff, and thats the type of thing you can't really side step, unless > you give up the functionality thats using it. > > Grant Ingersoll wrote: > >> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: >> >> >>> Hi to all! >>> Lately my solr servers seem to stop responding once in a while. I'm >>> using >>> solr 1.3. >>> Of course I'm having more traffic on the servers. >>> So I logged the Garbage Collection activity to check if it's because of >>> that. It seems like 11% of the time the application runs, it is stopped >>> because of GC. And some times the GC takes up to 10 seconds! >>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon >>> servers. My index is around 10GB and I'm giving to the instances 10GB of >>> RAM. >>> >>> How can I check which is the GC that it is being used? If I'm right JVM >>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do >>> you have >>> any recommendation on this? >>> >> As I said in Eteve's thread on JVM settings, some extra time spent on >> application design/debugging will save a whole lot of headache in >> Garbage Collection and trying to tune the gazillion different options >> available. Ask yourself: What is on the heap and does it need to be >> there? For instance, do you, if you have them, really need sortable >> ints? If your servers seem to come to a stop, I'm going to bet you >> have major collections going on. Major collections in a production >> system are very bad. They tend to happen right after commits in >> poorly tuned systems, but can also happen in other places if you let >> things build up due to really large heaps and/or things like really >> large cache settings. I would pull up jConsole and have a look at >> what is happening when the pauses occur. Is it a major collection? >> If so, then hook up a heap analyzer or a profiler and see what is on >> the heap around those times. Then have a look at your schema/config, >> etc. and see if there are things that are memory intensive (sorting, >> faceting, excessively large filter caches). >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> > > > -- - Mark http://www.lucidimagination.com