Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

S.L Tue, 28 Oct 2014 09:42:12 -0700

Will,

I think in one of your other emails(which I am not able to find) you has
asked if I was indexing directly from MapReduce jobs, yes I am indexing
directly from the map task and that is done using SolrJ with a
SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use
something like MapReducerIndexerTool , which I suupose writes to HDFS and
that is in a subsequent step moved to Solr index ? If so why ?


I dont use any softCommits and do autocommit every 15 seconds , the snippet
in the configuration can be seen below.

     <autoSoftCommit>
       <maxTime>${solr.
autoSoftCommit.maxTime:-1}</maxTime>
     </autoSoftCommit>

     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

       <openSearcher>true</openSearcher>
     </autoCommit>

I looked at the localhost_access.log file ,  all the GET and POST requests
have a sub-second response time.




On Tue, Oct 28, 2014 at 2:06 AM, Will Martin <wmartin...@gmail.com> wrote:

> The easiest, and coarsest measure of response time [not service time in a
> distributed system] can be picked up in your localhost_access.log file.
> You're using tomcat write?  Lookup AccessLogValve in the docs and
> server.xml. You can add configuration to report the payload and time to
> service the request without touching any code.
>
> Queueing theory is what Otis was talking about when he said you've
> saturated your environment. In AWS people just auto-scale up and don't
> worry about where the load comes from; its dumb if it happens more than 2
> times. Capacity planning is tough, let's hope it doesn't disappear
> altogether.
>
> G'luck
>
>
> -----Original Message-----
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Monday, October 27, 2014 9:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
> out of synch.
>
> Good point about ZK logs , I do see the following exceptions
> intermittently in the ZK log.
>
> 2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
> 2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
> establish new session at /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:00:06,746 [myid:1] - INFO
> [CommitProcessor:1:ZooKeeperServer@617] - Established session
> 0x14949db9da40037 with negotiated timeout 10000 for client
> /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x14949db9da40037, likely client has closed socket
>         at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>         at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:744)
>
> For queuing theory , I dont know of any way to see how fasts the requests
> are being served by SolrCloud , and if a queue is being maintained if the
> service rate is slower than the rate of requests from the incoming multiple
> threads.
>
> On Mon, Oct 27, 2014 at 7:09 PM, Will Martin <wmartin...@gmail.com> wrote:
>
> > 2 naïve comments, of course.
> >
> >
> >
> > -          Queuing theory
> >
> > -          Zookeeper logs.
> >
> >
> >
> > From: S.L [mailto:simpleliving...@gmail.com]
> > Sent: Monday, October 27, 2014 1:42 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > replicas out of synch.
> >
> >
> >
> > Please find the clusterstate.json attached.
> >
> > Also in this case atleast the Shard1 replicas are out of sync , as can
> > be seen below.
> >
> > Shard 1 replica 1 *does not* return a result with distrib=false.
> >
> > Query
> > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
> > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%
> > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu
> > g=track&shards.info=true>
> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false
> > &debug=track&
> > shards.info=true
> >
> >
> >
> > Result :
> >
> > <response><lst name="responseHeader"><int name="status">0</int><int
> > name="QTime">1</int><lst name="params"><str name="q">*:*</str><str name="
> > shards.info">true</str><str name="distrib">false</str><str
> > name="debug">track</str><str name="wt">xml</str><str
> > name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><
> > result name="response" numFound="0" start="0"/><lst
> > name="debug"/></response>
> >
> >
> >
> > Shard1 replica 2 *does* return the result with distrib=false.
> >
> > Query:
> > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
> > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%
> > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu
> > g=track&shards.info=true>
> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false
> > &debug=track&
> > shards.info=true
> >
> > Result:
> >
> > <response><lst name="responseHeader"><int name="status">0</int><int
> > name="QTime">1</int><lst name="params"><str name="q">*:*</str><str name="
> > shards.info">true</str><str name="distrib">false</str><str
> > name="debug">track</str><str name="wt">xml</str><str
> > name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><
> > result name="response" numFound="1" start="0"><doc><str
> > name="thingURL"> http://www.xyz.com</str><str
> > name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5</str><long
> > name="_version_">1483135330558148608</long></doc></result><lst
> > name="debug"/></response>
> >
> >
> >
> > On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > On Mon, Oct 27, 2014 at 9:40 PM, S.L <simpleliving...@gmail.com> wrote:
> >
> > > One is not smaller than the other, because the numDocs is same for
> > > both "replicas" and essentially they seem to be disjoint sets.
> > >
> >
> > That is strange. Can we see your clusterstate.json? With that, please
> > also specify the two replicas which are out of sync.
> >
> > >
> > > Also manually purging the replicas is not option , because this is
> > > "frequently" indexed index and we need everything to be automated.
> > >
> > > What other options do I have now.
> > >
> > > 1. Turn of the replication completely in SolrCloud 2. Use
> > > traditional Master Slave replication model.
> > > 3. Introduce a "replica" aware field in the index , to figure out
> > > which "replica" the request should go to from the client.
> > > 4. Try a distribution like Helios to see if it has any different
> > behavior.
> > >
> > > Just think out loud here ......
> > >
> > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma <
> > > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > Hi - if there is a very large discrepancy, you could consider to
> > > > purge
> > > the
> > > > smallest replica, it will then resync from the leader.
> > > >
> > > >
> > > > -----Original message-----
> > > > > From:S.L <simpleliving...@gmail.com>
> > > > > Sent: Monday 27th October 2014 16:41
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > > replicas
> > > > out of synch.
> > > > >
> > > > > Markus,
> > > > >
> > > > > I would like to ignore it too, but whats happening is that the
> > > > > there
> > > is a
> > > > > lot of discrepancy between the replicas , queries like
> > > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail
> > > > > depending on
> > > > which
> > > > > replica the request goes to, because of huge amount of
> > > > > discrepancy
> > > > between
> > > > > the replicas.
> > > > >
> > > > > Thank you for confirming that it is a know issue , I was
> > > > > thinking I
> > was
> > > > the
> > > > > only one facing this due to my set up.
> > > > >
> > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma <
> > > > markus.jel...@openindex.io>
> > > > > wrote:
> > > > >
> > > > > > It is an ancient issue. One of the major contributors to the
> > > > > > issue
> > > was
> > > > > > resolved some versions ago but we are still seeing it
> > > > > > sometimes
> > too,
> > > > there
> > > > > > is nothing to see in the logs. We ignore it and just reindex.
> > > > > >
> > > > > > -----Original message-----
> > > > > > > From:S.L <simpleliving...@gmail.com>
> > > > > > > Sent: Monday 27th October 2014 16:25
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud
> > > > > > > 4.10.1
> > > > replicas
> > > > > > out of synch.
> > > > > > >
> > > > > > > Thank Otis,
> > > > > > >
> > > > > > > I have checked the logs , in my case the default
> > > > > > > catalina.out
> > and I
> > > > dont
> > > > > > > see any OOMs or , any other exceptions.
> > > > > > >
> > > > > > > What others metrics do you suggest ?
> > > > > > >
> > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
> > > > > > > otis.gospodne...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > You may simply be overwhelming your cluster-nodes. Have
> > > > > > > > you
> > > checked
> > > > > > > > various metrics to see if that is the case?
> > > > > > > >
> > > > > > > > Otis
> > > > > > > > --
> > > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized
> > > > > > > > Log
> > > > Management
> > > > > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L
> > > > > > > > > <simpleliving...@gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Folks,
> > > > > > > > >
> > > > > > > > > I have posted previously about this , I am using
> > > > > > > > > SolrCloud
> > > > 4.10.1 and
> > > > > > > > have
> > > > > > > > > a sharded collection with  6 nodes , 3 shards and a
> > replication
> > > > > > factor
> > > > > > > > of 2.
> > > > > > > > >
> > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map
> > > > > > > > > fetch
> > > > tasks ,
> > > > > > that
> > > > > > > > > can each have upto 5 threds each , so the load on the
> > indexing
> > > > side
> > > > > > can
> > > > > > > > get
> > > > > > > > > to as high as 75 concurrent threads.
> > > > > > > > >
> > > > > > > > > I am facing an issue where the replicas of a particular
> > > shard(s)
> > > > are
> > > > > > > > > consistently getting out of synch , initially I thought
> > > > > > > > > this
> > > was
> > > > > > > > beccause I
> > > > > > > > > was using a custom component , but I did a fresh install
> > > > > > > > > and
> > > > removed
> > > > > > the
> > > > > > > > > custom component and reindexed using the Hadoop job , I
> > > > > > > > > still
> > > > see the
> > > > > > > > same
> > > > > > > > > behavior.
> > > > > > > > >
> > > > > > > > > I do not see any exceptions in my catalina.out , like
> > > > > > > > > OOM ,
> > or
> > > > any
> > > > > > other
> > > > > > > > > excepitions, I suspecting thi scould be because of the
> > > > multi-threaded
> > > > > > > > > indexing nature of the Hadoop job . I use
> > > > > > > > > CloudSolrServer
> > from
> > > my
> > > > > > java
> > > > > > > > code
> > > > > > > > > to index and initialize the CloudSolrServer using a 3
> > > > > > > > > node ZK
> > > > > > ensemble.
> > > > > > > > >
> > > > > > > > > Does any one know of any known issues with a highly
> > > > multi-threaded
> > > > > > > > indexing
> > > > > > > > > and SolrCloud ?
> > > > > > > > >
> > > > > > > > > Can someone help ? This issue has been slowing things
> > > > > > > > > down on
> > > my
> > > > end
> > > > > > for
> > > > > > > > a
> > > > > > > > > while now.
> > > > > > > > >
> > > > > > > > > Thanks and much appreciated!
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
> >
> >
> >
>
>

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

Reply via email to