Re: Periodic Slowness on Solr Cloud

Shawn Heisey Fri, 22 Nov 2013 09:02:40 -0800

On 11/22/2013 8:13 AM, Dave Seltzer wrote:

Regarding memory: Including duplicate data in shard replicas the entire
index is 350GB. Each server hosts a total of 44GB of data. Each server has
28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java
would take the memory it needs and leave the rest to the OS for cache.

That's not how Java works. Java has a min heap and max heap setting.If you (or the auto-detected settings) tell it that the max heap is 4GB,it will only ever use slightly more than 4GB of RAM. If the app needsmore than that, this will lead to terrible performance and/or out ofmemory errors.

You can see how much the max heap is in the Solr admin UI dashboard -it'll be the right-most number on the JVM-Memory graph. On my 64-bitlinux development machine with 16GB of RAM, it looks like Java defaultsto a 4GB max heap. I have the heap size manually set to 7GB for Solr onthat machine. The 6GB heap you have mentioned might not be enough, orit might be more than you need. It all depends on the kind of queriesyou are doing and exactly how Solr is configured.

If it were me, I'd want a memory size between 48 and 64GB for a totalindex size of 44GB. Whether you really need that much is very dependenton your exact requirements, index makeup, and queries. To support thehigh query load you're sending, it probably is a requirement. Morememory is likely to help performance, but I can't guarantee it withoutlooking a lot deeper into your setup, and that's difficult to do via email.

One thing I can tell you about checking performance - see how much ofyour 70% CPU usage is going to I/O wait. If it's more than a fewpercent, more memory might help. First try increasing the max heap by 1or 2GB.

Given that I'll never need to serve 200 concurrent connections in
production, do you think my servers need more memory?
Should I be tinkering with -Xmx and -Xms?

If you'll never need to serve that many, test with a lower number. Makeit higher than you'll need, but not a lot higher. The test with 200connections isn't a bad idea -- you do want to stress test things waybeyond your actual requirements, but you'll also want to see how it doeswith a more realistic load.

Those are the min/max heap settings I just mentioned. IMHO you shouldset at least the max heap. If you want to handle a high load, it's agood idea to set the min heap to the same value as the max heap, so thatit doesn't need to worry about hitting limits in order to allocateadditional memory. It'll eventually allocate the max heap anyway.

Regarding commits: My end-users want new data to be made available quickly.
Thankfully I'm only inserting between 1 and 3 documents per second so the
change-rate isn't crazy.

Should I just slow down my commit frequency, and depend on soft-commits? If
I do this, will the commits take even longer?
Given 1000 documents, is it generally faster to do 10 commits of 100, or 1
commit of 1000?

Fewer commits is always better. The amount of time they take isn'tstrongly affected by the number of new documents, unless there are a LOTof them. Figure out the timeframe that's the maximum amount of time (inmilliseconds) that you think people are willing to wait for new data tobecome visible. Use that as your autoSoftCommit interval, or as thecommitWithin parameter on your indexing requests. Set your autoCommitinterval to around five minutes, as described on the wiki page Ilinked. If you are using auto settings and/or commitWithin, then youwill never need to send an explicit commit command. Reducing commitfrequency is one of the first things you'll want to try. Frequentcommits use a *lot* of I/O and CPU resources.

Although there are exceptions, most installs rarely NEED commits tohappen more often than about once a minute, and longer intervals areoften perfectly acceptable. Even in situations where a higher frequencyis required, 10-15 seconds is often good enough. Getting sub-secondcommit times is *possible*, but usually requires significant hardwareinvestment or changing the config in a way that is detrimental to queryperformance.


Thanks,
Shawn

Re: Periodic Slowness on Solr Cloud

Reply via email to