Will, I think in one of your other emails(which I am not able to find) you has asked if I was indexing directly from MapReduce jobs, yes I am indexing directly from the map task and that is done using SolrJ with a SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use something like MapReducerIndexerTool , which I suupose writes to HDFS and that is in a subsequent step moved to Solr index ? If so why ?
I dont use any softCommits and do autocommit every 15 seconds , the snippet in the configuration can be seen below. <autoSoftCommit> <maxTime>${solr. autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> <autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>true</openSearcher> </autoCommit> I looked at the localhost_access.log file , all the GET and POST requests have a sub-second response time. On Tue, Oct 28, 2014 at 2:06 AM, Will Martin <wmartin...@gmail.com> wrote: > The easiest, and coarsest measure of response time [not service time in a > distributed system] can be picked up in your localhost_access.log file. > You're using tomcat write? Lookup AccessLogValve in the docs and > server.xml. You can add configuration to report the payload and time to > service the request without touching any code. > > Queueing theory is what Otis was talking about when he said you've > saturated your environment. In AWS people just auto-scale up and don't > worry about where the load comes from; its dumb if it happens more than 2 > times. Capacity planning is tough, let's hope it doesn't disappear > altogether. > > G'luck > > > -----Original Message----- > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Monday, October 27, 2014 9:25 PM > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > out of synch. > > Good point about ZK logs , I do see the following exceptions > intermittently in the ZK log. > > 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for > client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 > 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket > connection from /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to > establish new session at /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:00:06,746 [myid:1] - INFO > [CommitProcessor:1:ZooKeeperServer@617] - Established session > 0x14949db9da40037 with negotiated timeout 10000 for client > /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x14949db9da40037, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > at java.lang.Thread.run(Thread.java:744) > > For queuing theory , I dont know of any way to see how fasts the requests > are being served by SolrCloud , and if a queue is being maintained if the > service rate is slower than the rate of requests from the incoming multiple > threads. > > On Mon, Oct 27, 2014 at 7:09 PM, Will Martin <wmartin...@gmail.com> wrote: > > > 2 naïve comments, of course. > > > > > > > > - Queuing theory > > > > - Zookeeper logs. > > > > > > > > From: S.L [mailto:simpleliving...@gmail.com] > > Sent: Monday, October 27, 2014 1:42 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > replicas out of synch. > > > > > > > > Please find the clusterstate.json attached. > > > > Also in this case atleast the Shard1 replicas are out of sync , as can > > be seen below. > > > > Shard 1 replica 1 *does not* return a result with distrib=false. > > > > Query > > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > > g=track&shards.info=true> > > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > > &debug=track& > > shards.info=true > > > > > > > > Result : > > > > <response><lst name="responseHeader"><int name="status">0</int><int > > name="QTime">1</int><lst name="params"><str name="q">*:*</str><str name=" > > shards.info">true</str><str name="distrib">false</str><str > > name="debug">track</str><str name="wt">xml</str><str > > name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst>< > > result name="response" numFound="0" start="0"/><lst > > name="debug"/></response> > > > > > > > > Shard1 replica 2 *does* return the result with distrib=false. > > > > Query: > > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > > g=track&shards.info=true> > > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > > &debug=track& > > shards.info=true > > > > Result: > > > > <response><lst name="responseHeader"><int name="status">0</int><int > > name="QTime">1</int><lst name="params"><str name="q">*:*</str><str name=" > > shards.info">true</str><str name="distrib">false</str><str > > name="debug">track</str><str name="wt">xml</str><str > > name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst>< > > result name="response" numFound="1" start="0"><doc><str > > name="thingURL"> http://www.xyz.com</str><str > > name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5</str><long > > name="_version_">1483135330558148608</long></doc></result><lst > > name="debug"/></response> > > > > > > > > On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < > > shalinman...@gmail.com> wrote: > > > > On Mon, Oct 27, 2014 at 9:40 PM, S.L <simpleliving...@gmail.com> wrote: > > > > > One is not smaller than the other, because the numDocs is same for > > > both "replicas" and essentially they seem to be disjoint sets. > > > > > > > That is strange. Can we see your clusterstate.json? With that, please > > also specify the two replicas which are out of sync. > > > > > > > > Also manually purging the replicas is not option , because this is > > > "frequently" indexed index and we need everything to be automated. > > > > > > What other options do I have now. > > > > > > 1. Turn of the replication completely in SolrCloud 2. Use > > > traditional Master Slave replication model. > > > 3. Introduce a "replica" aware field in the index , to figure out > > > which "replica" the request should go to from the client. > > > 4. Try a distribution like Helios to see if it has any different > > behavior. > > > > > > Just think out loud here ...... > > > > > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma < > > > markus.jel...@openindex.io> > > > wrote: > > > > > > > Hi - if there is a very large discrepancy, you could consider to > > > > purge > > > the > > > > smallest replica, it will then resync from the leader. > > > > > > > > > > > > -----Original message----- > > > > > From:S.L <simpleliving...@gmail.com> > > > > > Sent: Monday 27th October 2014 16:41 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > > replicas > > > > out of synch. > > > > > > > > > > Markus, > > > > > > > > > > I would like to ignore it too, but whats happening is that the > > > > > there > > > is a > > > > > lot of discrepancy between the replicas , queries like > > > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail > > > > > depending on > > > > which > > > > > replica the request goes to, because of huge amount of > > > > > discrepancy > > > > between > > > > > the replicas. > > > > > > > > > > Thank you for confirming that it is a know issue , I was > > > > > thinking I > > was > > > > the > > > > > only one facing this due to my set up. > > > > > > > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < > > > > markus.jel...@openindex.io> > > > > > wrote: > > > > > > > > > > > It is an ancient issue. One of the major contributors to the > > > > > > issue > > > was > > > > > > resolved some versions ago but we are still seeing it > > > > > > sometimes > > too, > > > > there > > > > > > is nothing to see in the logs. We ignore it and just reindex. > > > > > > > > > > > > -----Original message----- > > > > > > > From:S.L <simpleliving...@gmail.com> > > > > > > > Sent: Monday 27th October 2014 16:25 > > > > > > > To: solr-user@lucene.apache.org > > > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud > > > > > > > 4.10.1 > > > > replicas > > > > > > out of synch. > > > > > > > > > > > > > > Thank Otis, > > > > > > > > > > > > > > I have checked the logs , in my case the default > > > > > > > catalina.out > > and I > > > > dont > > > > > > > see any OOMs or , any other exceptions. > > > > > > > > > > > > > > What others metrics do you suggest ? > > > > > > > > > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > > > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > You may simply be overwhelming your cluster-nodes. Have > > > > > > > > you > > > checked > > > > > > > > various metrics to see if that is the case? > > > > > > > > > > > > > > > > Otis > > > > > > > > -- > > > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized > > > > > > > > Log > > > > Management > > > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L > > > > > > > > > <simpleliving...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > Folks, > > > > > > > > > > > > > > > > > > I have posted previously about this , I am using > > > > > > > > > SolrCloud > > > > 4.10.1 and > > > > > > > > have > > > > > > > > > a sharded collection with 6 nodes , 3 shards and a > > replication > > > > > > factor > > > > > > > > of 2. > > > > > > > > > > > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map > > > > > > > > > fetch > > > > tasks , > > > > > > that > > > > > > > > > can each have upto 5 threds each , so the load on the > > indexing > > > > side > > > > > > can > > > > > > > > get > > > > > > > > > to as high as 75 concurrent threads. > > > > > > > > > > > > > > > > > > I am facing an issue where the replicas of a particular > > > shard(s) > > > > are > > > > > > > > > consistently getting out of synch , initially I thought > > > > > > > > > this > > > was > > > > > > > > beccause I > > > > > > > > > was using a custom component , but I did a fresh install > > > > > > > > > and > > > > removed > > > > > > the > > > > > > > > > custom component and reindexed using the Hadoop job , I > > > > > > > > > still > > > > see the > > > > > > > > same > > > > > > > > > behavior. > > > > > > > > > > > > > > > > > > I do not see any exceptions in my catalina.out , like > > > > > > > > > OOM , > > or > > > > any > > > > > > other > > > > > > > > > excepitions, I suspecting thi scould be because of the > > > > multi-threaded > > > > > > > > > indexing nature of the Hadoop job . I use > > > > > > > > > CloudSolrServer > > from > > > my > > > > > > java > > > > > > > > code > > > > > > > > > to index and initialize the CloudSolrServer using a 3 > > > > > > > > > node ZK > > > > > > ensemble. > > > > > > > > > > > > > > > > > > Does any one know of any known issues with a highly > > > > multi-threaded > > > > > > > > indexing > > > > > > > > > and SolrCloud ? > > > > > > > > > > > > > > > > > > Can someone help ? This issue has been slowing things > > > > > > > > > down on > > > my > > > > end > > > > > > for > > > > > > > > a > > > > > > > > > while now. > > > > > > > > > > > > > > > > > > Thanks and much appreciated! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > > > > > > > >