Thanks again, Erick, for pointing us in the right direction. Yes, I am seeing heavy disk I/O while querying. I queried a single collection. A query for 10 rows can cause 100-150 MB disk read on each node. While querying for a 1000 rows, disk read is in range of 2-7 GB per node.
Is this normal? I didn't quite get what is happening behind the scenes. I mean just 1000 rows causing up to 2-7 GB of disk read? Now it may be something basic but it would be helpful if you put some light on it. It seems like disk I/O is the bottleneck here. With the amount of data we are dealing with, is increasing the number of hosts the only option or we may have missed on configuring Solr properly? On Fri, May 29, 2020 at 5:13 PM Erick Erickson <erickerick...@gmail.com> wrote: > Right, you’re running into the “laggard” problem, you can’t get the overall > result back until every shard has responded. There’s an interesting > parameter “shards.info=true” will give you some information about > the time taken by the sub-search on each shard. > > But given your numbers, I think your root problem is that > your hardware is overmatched. In total, you have 14B documents, correct? > and a replication factor of 2. Meaning each of your 10 machines has 2.8 > billion docs in 128G total memory. Another way to look at it is that you > have 7 x 20 x 2 = 280 replicas each with a 40G index. So each node has > 28 replicas/node and handles over a terabyte of index in aggregate. At > first > blush, you’ve overloaded your hardware. My guess here is that one node or > the other has to do a lot of swapping/gc/whatever quite regularly when > you query. Given that you’re on HDDs, this can be quite expensive. > > I think you can simplify your analysis problem a lot by concentrating on > a single machine, load it up and analyze it heavily. Monitor I/O, analyze > GC and CPU utilization. My first guess is you’ll see heavy I/O. Once the > index > is read into memory, a well-sized Solr installation won’t see much I/O, > _especially_ if you simplify the query to only ask for, say, some docValues > field or rows=0. I think that your OS is swapping significant segments of > the > Lucene index in and out to satisfy your queries. > > GC is always a place to look. You should have GC logs available to see if > you spend lots of CPU cycles in GC. I doubt that GC tuning will fix your > performance issues, but it’s something to look at. > > A quick-n-dirty way to see if it’s swapping as I suspect is to monitor > CPU utilization. A red flag is if it’s low and your queries _still_ take > a long time. That’s a “smoking gun” that you’re swapping. While > indicative, that’s not definitive since I’ve also seen CPU pegged because > of GC, so if CPUs are running hot, you have to dig deeper... > > > So to answer your questions: > > 1> I doubt decreasing the number of shards will help.. I think you > simply have too many docs per node, changing the number of > shards isn’t going to change that. > > 2> A strongly suspect that increasing replicas will make the > problem _worse_ since it’ll just add more docs per node. > > 3> It Depends (tm). I wrote “the sizing blog” a long time ago because > this question comes up so often and is impossible to answer in the > abstract. Well, it’s possible in theory, but in practice there are > just > too many variables. What you need to do to fully answer this in your > situation with your data is set up a node and “test it to > destruction”. > > > https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > I want to emphasize that you _must_ have realistic queries when > you test as in that blog, and you _must_ have a bunch of them > in order to not get fooled by hitting, say, your queryResultCache. I > had one client who “stress tested” with the same query and was > getting 3ms response times because, after the first one, they never > needed to do any searching at all, everything was in caches. Needless > to say that didn’t hold up when they used a realistic query mix. > > Best, > Erick > > > On May 29, 2020, at 4:53 AM, Anshuman Singh <singhanshuma...@gmail.com> > wrote: > > > > Thanks for your reply, Erick. You helped me in improving my understanding > > of how Solr distributed requests work internally. > > > > Actually my ultimate goal is to improve search performance in one of our > > test environment where the queries are taking upto 60 seconds to execute. > > *We want to fetch at least the first top 100 rows in seconds (< 5 > seconds). > > * > > > > Right now, we have 7 Collections across 10 Solr nodes, each Collection > > having approx 2B records equally distributed across 20 shards with rf 2. > > Each replica/core is ~40GB in size . The number of users are very few > > (<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size > > is 24GB. In the actual production environment, we are planning for 100 > such > > machines and we will be ingesting ~2B records on daily basis. We will > > retain data of upto 3 months. > > > > I followed your suggestion of not querying more than 100 rows and this is > > my observation. I ran queries with the debugQuery param and found that > the > > query response time depends on the worst performing shard as some of the > > shards take longer to execute the query than other shards. > > > > Here are my questions: > > > > 1. Is decreasing number of shards going to help us as there will be > > lesser number of shards to be queried? > > 2. Is increasing number of replicas going to help us as there will be > > load balancing? > > 3. How many records should we keep in each Collection or in each > > replica/core? Will we face performance issues if the core size becomes > too > > big? > > > > Any other suggestions are appreciated. > > > > On Wed, May 27, 2020 at 9:23 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> First of all, asking for that many rows will spend a lot of time > >> gathering the document fields. Assuming you have stored fields, > >> each doc requires > >> 1> the aggregator node getting the candidate 100000 docs from each shard > >> > >> 2> The aggregator node sorting those 100000 docs from each shard into > the > >> true top 100000 based on the sort criteria (score by default) > >> > >> 3> the aggregator node going back to the shards and asking them for > those > >> docs of that 100000 that are resident on that shard > >> > >> 4> the aggregator node assembling the final docs to be sent to the > client > >> and sending them. > >> > >> So my guess is that when you fire requests at a particular replica that > >> has to get them from the other shard’s replica on another host, the > network > >> back-and-forth is killing your perf. It’s not that your network is > having > >> problems, just that you’re pushing a lot of data back and forth in your > >> poorly-performing cases. > >> > >> So first of all, specifying 100K rows is an anti-pattern. Outside of > >> streaming, Solr is built on the presumption that you’re after the top > few > >> rows (< 100, say). The times vary a lot depending on whether you need to > >> read stored fields BTW. > >> > >> Second, I suspect your test is bogus. If you run the tests in the order > >> you gave, the first one will read the necessary data from disk and > probably > >> have it in the OS disk cache for the second and subsequent. And/or > you’re > >> getting results from your queryResultCache (although you’d have to have > a > >> big one). Specifying the exact same query when trying to time things is > >> usually a mistake. > >> > >> If your use-case requires 100K rows, you should be using streaming or > >> cursorMark. While that won’t make the end-to-end time shorter, but won’t > >> put such a strain on the system. > >> > >> Best, > >> Erick > >> > >>> On May 27, 2020, at 10:38 AM, Anshuman Singh < > singhanshuma...@gmail.com> > >> wrote: > >>> > >>> I have a Solr cloud setup (Solr 7.4) with a collection "test" having > two > >>> shards on two different nodes. There are 4M records equally distributed > >>> across the shards. > >>> > >>> If I query the collection like below, it is slow. > >>> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000 > >>> QTime: 6930 > >>> > >>> If I query a particular shard like below, it is also slow. > >>> http://localhost:8983/solr/*test_shard1_replica_n2* > >>> /select?q=*:*&rows=100000&shards=*shard2* > >>> QTime: 5494 > >>> *Notice shard2 in shards parameter and shard1 in the core being > queried.* > >>> > >>> But this is faster: > >>> http://localhost:8983/solr/*test_shard1_replica_n2* > >>> /select?q=*:*&rows=100000&shards=*shard1* > >>> QTime: 57 > >>> > >>> This is also faster: > >>> http://localhost:8983/solr/*test_shard2_replica_n4* > >>> /select?q=*:*&rows=100000&shards=*shard2* > >>> QTime: 71 > >>> > >>> I don't think it is the network as I performed similar tests with a > >> single > >>> node setup as well. If you query a particular core and the > corresponding > >>> logical shard, it is much faster than querying a different shard or > core. > >>> > >>> Why is this behaviour? How to make the first two queries work as fast > as > >>> the last two queries? > >> > >> > >