Interesting to note that when I ran the experiment with Solr 9, the CPU usage was about the same as Solr 6.
On Fri, Aug 26, 2022 at 7:02 PM Shawn Heisey <[email protected]> wrote: > On 8/26/22 02:55, Sidharth Negi wrote: > > We set up Solr 6 and Solr 8 on two identical AWS instances (16 cores, > > 128 GB of which Solr was given Xmx=50GB) and indexed the same data on > > them and tested under the same load of traffic. The schema and > > solrconfig.xml are exactly identical - the schema file is just renamed > > as managed-schema in Solr 8. None of the two machines are indexing > > data or taking replication and both have about equal number of > > segments (42 and 45 segments for Solr 6 and Solr 8 respectively) > > Are you really sure that the heap needs to be that big? It really is > huge, and due to the way that Java works, anything 32GB or larger > requires 64-bit pointers. So a heap size of 31GB actually has more > memory available than a heap size of 32GB. At 50 GB, you have likely > passed the break-even point. But unless you're dealing with hundreds of > millions of documents, it is very unlikely that you need a heap that big. I agree - the number of documents we are dealing with is ~30 million so most of the heap is unused (over 30 GB). > > What's surprising is that Solr 6.6.1 CPU usage is considerably lower > > than Solr 8.11.2. Just look at the screenshot attached. The blue line > > is Solr 8.11.2 while the orange one is Solr 6.6.1. Note that the Solr > > 8 CPU usage is considerably higher with identical traffic. > > You have higher CPU usage, but does Solr 8 actually perform worse than > Solr 6? What do other metrics show, like CPU iowait percentage? I don't think Solr 8 performs any "worse" in terms of query times taken for a query as such - it's just that the CPU usage is linearly increasing with traffic and the screenshot is for 30% traffic. Hence for full scale traffic, Solr 6 will win out as that will need a lesser number of machines since we want to keep CPU usage well under 70% on a production instance even though the query times are about the same. > You've talked about segment counts, but haven't talked about index > size. Is the total disk space consumed by the index about the same on > both? The disk space taken by the index of both Solr versions was about ~35 GB and the number of docs ~30 million in both. > I can think of two differences between 6 and 8 that are fundamental: > First: 6 uses CMS for garbage collection and 8 uses G1. G1 has better > overall performance because more of its work can function in parallel > with the application, and I can imagine that it uses a little bit more > of resources like memory and CPU. Second: 6 uses log4j 1 and 8 uses > log4j 2. The later logging library is much faster because it takes > advantage of threads, which could increase the overall CPU usage. > Whether that would cause a significant impact depends mostly on how busy > the server is and whether the logging configuration has been changed. > With default settings, at least one log message is created for almost > every request that Solr receives. > Let me run an experiment using the same GC settings on both to see if that works. Is there anything else we can do to narrow down the reason for sure? All slaves combined will have to serve over 80k requests per second once we set the number of slaves such that the CPU usage of all remains well below 70% at peaks. > There have also been a lot of advancements in other areas, and those > probably contribute. Higher CPU usage does not automatically mean that > performance is worse. Sometimes applications actually perform better > when using more CPU. > I agree - higher CPU usage is not directly meaning worse performance but as mentioned above - for us that would translate into more infra and hence added cost. > Thanks, > Shawn > >
