Hi, We are facing the same issues on our setup. 3 zk nodes, 1 shard, 10 collections, 1 replica. v. 5.0.0. default startup params. Solr Servers: 2 core cpu, 7gb memory Index size: 28g, 3gb heap
This setup was running on v. 4.6 before upgrading to 5 without any of these errors. The timeout seems to happen randomly and only to 1 of the replicas (fortunately) at the time. Joe: did you get anywhere with the perf hints? If not, any other tips appreciated. null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:630) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:582) at org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:932) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:256) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261 ) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) - Eirik fre. 5. jun. 2015 kl. 15.58 skrev Joseph Obernberger < j...@lovehorsepower.com>: > Thank you Shawn! Yes - it is now a Solr 5.1.0 cloud on 27 nodes and we > use the startup scripts. The current index size is 3.0T - about 115G > per node - index is stored in HDFS which is spread across those 27 nodes > and about (a guess) - 256 spindles. Each node has 26G of HDFS cache > (MaxDirectMemorySize) allocated to Solr. Zookeeper storage is on local > disk. Solr and HDFS run on the same machines. Each node is connected to > a switch over 1G Ethernet, but the backplane is 40G. > Do you think the clusterstatus and the zookeeper timeouts are related to > performance issues talking to HDFS? > > The JVM parameters are: > ----------------------------------------- > -DSTOP.KEY=solrrocks > -DSTOP.PORT=8100 > -Dhost=helios > -Djava.net.preferIPv4Stack=true > -Djetty.port=9100 > -DnumShards=27 > -Dsolr.clustering.enabled=true > -Dsolr.install.dir=/opt/solr > -Dsolr.lock.type=hdfs > -Dsolr.solr.home=/opt/solr/server/solr > -Duser.timezone=UTC-DzkClientTimeout=15000 > -DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181, > triton.querymasters.com:2181,oberon.querymasters.com:2181, > portia.querymasters.com:2181,puck.querymasters.com:2181/solr5 > > -XX:+CMSParallelRemarkEnabled > -XX:+CMSScavengeBeforeRemark > -XX:+ParallelRefProcEnabled > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCDateStamps > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseConcMarkSweepGC > -XX:+UseLargePages > -XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1 > -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:CMSTriggerPermRatio=80 > -XX:ConcGCThreads=8 > -XX:MaxDirectMemorySize=26g > -XX:MaxTenuringThreshold=8 > -XX:NewRatio=3 > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs > -XX:ParallelGCThreads=8 > -XX:PretenureSizeThreshold=64m > -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 > -Xloggc:/opt/solr/server/logs/solr_gc.log > -Xms8g > -Xmx16g > -Xss256k > -verbose:gc > -------------------- > > The directoryFactory is configured as follows: > > <directoryFactory name="DirectoryFactory" > class="solr.HdfsDirectoryFactory"> > <bool name="solr.hdfs.blockcache.enabled">true</bool> > <int name="solr.hdfs.blockcache.slab.count">200</int> > <bool > name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> > <int name="solr.hdfs.blockcache.blocksperbank">16384</int> > <bool name="solr.hdfs.blockcache.read.enabled">true</bool> > <bool name="solr.hdfs.blockcache.write.enabled">false</bool> > <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> > <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">64</int> > <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">512</int> > <str name="solr.hdfs.home">hdfs://nameservice1:8020/solr5</str> > <str > name="solr.hdfs.confdir">/etc/hadoop/conf.cloudera.hdfs1</str> > </directoryFactory> > > -Joe > > On 6/5/2015 9:34 AM, Shawn Heisey wrote: > > On 6/3/2015 6:39 PM, Joseph Obernberger wrote: > >> Hi All - I've run into a problem where every-once in a while one or more > >> of the shards (27 shard cluster) will loose connection to zookeeper and > >> report "updates are disabled". In additional to the CLUSTERSTATUS > >> timeout errors, which don't seem to cause any issue, this one certainly > >> does as that shard no longer takes any (you guessed it!) updates! > >> We are using Zookeeper with 7 nodes (7 servers in our quorum). > >> There stack trace is: > > Other messages you have sent talk about Solr 5.x, and one of them > > mentions a 16-node cluster with a 2.9 terabyte index, with the index > > data stored on HDFS. > > > > I'm going to venture a guess that you don't have anywhere near enough > > RAM for proper disk caching, leading to general performance issues, > > which ultimately cause timeouts. With HDFS, I'm not sure whether OS > > disk cache on the Solr server matters very much, or whether that needs > > to be on the HDFS servers. I would guess the latter. Also, if your > > storage networking is gigabit or slower, HDFS may have significantly > > more latency than local storage. For good network storage speed, you > > want 10gig ethernet or Infiniband. > > > > If it's Solr 5.x and you are using the included startup scripts, then > > long GC pauses are probably not a major issue. The startup scripts > > include significant GC tuning. If you have deployed in your own > > container, GC tuning might be an issue -- it is definitely required. > > > > Here is where I have written down everything I've learned about Solr > > performance problems, most of which are due to one problem or another > > with memory: > > > > https://wiki.apache.org/solr/SolrPerformanceProblems > > > > Is your zookeeper database on local storage or HDFS? I would suggest > > keeping that on local storage for optimal performance. > > > > Thanks, > > Shawn > > > > > >