Wei, probably no need to answer my earlier questions; I think I see
the problem here, and believe it is indeed a bug, introduced in 8.3.
Will file an issue and submit a patch shortly.
Michael

On Mon, May 11, 2020 at 12:49 PM Michael Gibney
<mich...@michaelgibney.net> wrote:
>
> Hi Wei,
>
> In considering this problem, I'm stumbling a bit on terminology
> (particularly, where you mention "nodes", I think you're referring to
> "replicas"?). Could you confirm that you have 10 TLOG replicas per
> shard, for each of 6 shards? How many *nodes* (i.e., running solr
> server instances) do you have, and what is the replica placement like
> across those nodes? What, if any, non-TLOG replicas do you have per
> shard (not that it's necessarily relevant, but just to get a complete
> picture of the situation)?
>
> If you're able without too much trouble, can you determine what the
> behavior is like on Solr 8.3? (there were different changes introduced
> to potentially relevant code in 8.3 and 8.4, and knowing whether the
> behavior you're observing manifests on 8.3 would help narrow down
> where to look for an explanation).
>
> Michael
>
> On Fri, May 8, 2020 at 7:34 PM Wei <weiwan...@gmail.com> wrote:
> >
> > Update:  after I remove the shards.preference parameter from
> > solrconfig.xml,  issue is gone and internal shard requests are now
> > balanced. The same parameter works fine with solr 7.6.  Still not sure of
> > the root cause, but I observed a strange coincidence: the nodes that are
> > most frequently picked for shard requests are the first node in each shard
> > returned from the CLUSTERSTATUS api.  Seems something wrong with shuffling
> > equally compared nodes when shards.preference is set.  Will report back if
> > I find more.
> >
> > On Mon, Apr 27, 2020 at 5:59 PM Wei <weiwan...@gmail.com> wrote:
> >
> > > Hi Eric,
> > >
> > > I am measuring the number of shard requests, and it's for query only, no
> > > indexing requests.  I have an external load balancer and see each node
> > > received about the equal number of external queries. However for the
> > > internal shard queries,  the distribution is uneven:    6 nodes (one in
> > > each shard,  some of them are leaders and some are non-leaders ) gets 
> > > about
> > > 80% of the shard requests, the other 54 nodes gets about 20% of the shard
> > > requests.   I checked a few other parameters set:
> > >
> > > -Dsolr.disable.shardsWhitelist=true
> > > shards.preference=replica.location:local,replica.type:TLOG
> > >
> > > Nothing seems to cause the strange behavior.  Any suggestions how to
> > > debug this?
> > >
> > > -Wei
> > >
> > >
> > > On Mon, Apr 27, 2020 at 5:42 PM Erick Erickson <erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Wei:
> > >>
> > >> How are you measuring utilization here? The number of incoming requests
> > >> or CPU?
> > >>
> > >> The leader for each shard are certainly handling all of the indexing
> > >> requests since they’re TLOG replicas, so that’s one thing that might
> > >> skewing your measurements.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> > On Apr 27, 2020, at 7:13 PM, Wei <weiwan...@gmail.com> wrote:
> > >> >
> > >> > Hi everyone,
> > >> >
> > >> > I have a strange issue after upgrade from 7.6.0 to 8.4.1. My cloud has 
> > >> > 6
> > >> > shards with 10 TLOG replicas each shard.  After upgrade I noticed that
> > >> one
> > >> > of the replicas in each shard is handling most of the distributed shard
> > >> > requests, so 6 nodes are heavily loaded while other nodes are idle.
> > >> There
> > >> > is no change in shard handler configuration:
> > >> >
> > >> > <shardHandlerFactory name="shardHandlerFactory" class=
> > >> > "HttpShardHandlerFactory">
> > >> >
> > >> >    <int name="socketTimeout">30000</int>
> > >> >
> > >> >    <int name="connTimeout">30000</int>
> > >> >
> > >> >    <int name="maxConnectionsPerHost">500</int>
> > >> >
> > >> > </shardHandlerFactory>
> > >> >
> > >> >
> > >> > What could cause the unbalanced internal distributed request?
> > >> >
> > >> >
> > >> > Thanks in advance.
> > >> >
> > >> >
> > >> >
> > >> > Wei
> > >>
> > >>

Reply via email to