Thanks Erick, I am using TLOG replicas in this SolrCloud cluster - 3 shards, each with 3 replicas.
Here¹s my decision logic based on my (limited) understanding - All shards seem to be equally used so to improve performance by adding shards I think I'd have to double from 3 shards to 6 (as indexing load is distributed equally) and when I do that then I should also double the number of AWS instances? Thanks! -Frank Frank Kelly Principal Software Engineer AAA Identity Profile Team (SCBE / CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32" W <http://360.here.com/> <https://www.twitter.com/here> <https://www.facebook.com/here> <https://www.linkedin.com/company/heremaps> <https://www.instagram.com/here/> On 5/21/18, 11:04 AM, "Erick Erickson" <erickerick...@gmail.com> wrote: >"replication falls behind and then starts to recover which causes more >usage" > >I'm not quite sure what you mean by this. Are you using TLOG or PULL >replica types? Or stand-alone Solr? There shouldn't really be any >replication in the ideal state for NRT replicas. > >If you're using SolrCloud, the usual scaling approacy if you're >index-heavy is to add more shards, and since you're CPU bound they'd >have to be on new AWS instances. Or, if you're running multiple >replicas on each instance, move some of the replicas to new instances. >Assuming NRT Solr replicas. > >Best, >Erick > >On Mon, May 21, 2018 at 10:25 AM, Kelly, Frank <frank.ke...@here.com> >wrote: >> Using Solr 5.3.1 - index >> >> We have an indexing heavy workload (we do more indexing than searching) >>and for those searches we do perform we have very few cache hits (25% of >>our index is in memory and the hit rate is < 0.1%) >> >> We are currently using r3.xlarge (memory optimized instances as we >>originally thought we¹d have a higher cache hit rate) with EBS >>optimization to IOPs configurable EBS drives. >> Our EBS traffic bandwidth seems to work great so searches on disk are >>pretty fast. >> Now though we seem CPU bound and if/ when Solr CPU gets pegged for too >>long replication falls behind and then starts to recover which causes >>more usage and then eventually shards go ³Down². >> >> Our key question: Scale up (fewer instances to manage) or Scale out >>(more instances to manage) and >> do we switch to compute optimized instances (the answer given our usage >>I assume is probably) >> >> Appreciate any thoughts folks have on this? >> >> Thanks! >> >> -Frank