Hi,

We are facing the same issues on our setup. 3 zk nodes, 1 shard, 10
collections, 1 replica. v. 5.0.0. default startup params.
Solr Servers: 2 core cpu, 7gb memory
Index size: 28g, 3gb heap

This setup was running on v. 4.6 before upgrading to 5 without any of these
errors. The timeout seems to happen randomly and only to 1 of the replicas
(fortunately) at the time. Joe: did you get anywhere with the perf hints?
If not, any other tips appreciated.

null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection
time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:630)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:582)
at
org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:932)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:256)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261
)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)

- Eirik


fre. 5. jun. 2015 kl. 15.58 skrev Joseph Obernberger <
j...@lovehorsepower.com>:

> Thank you Shawn!  Yes - it is now a Solr 5.1.0 cloud on 27 nodes and we
> use the startup scripts.  The current index size is 3.0T - about 115G
> per node - index is stored in HDFS which is spread across those 27 nodes
> and about (a guess) - 256 spindles.  Each node has 26G of HDFS cache
> (MaxDirectMemorySize) allocated to Solr.  Zookeeper storage is on local
> disk.  Solr and HDFS run on the same machines. Each node is connected to
> a switch over 1G Ethernet, but the backplane is 40G.
> Do you think the clusterstatus and the zookeeper timeouts are related to
> performance issues talking to HDFS?
>
> The JVM parameters are:
> -----------------------------------------
> -DSTOP.KEY=solrrocks
> -DSTOP.PORT=8100
> -Dhost=helios
> -Djava.net.preferIPv4Stack=true
> -Djetty.port=9100
> -DnumShards=27
> -Dsolr.clustering.enabled=true
> -Dsolr.install.dir=/opt/solr
> -Dsolr.lock.type=hdfs
> -Dsolr.solr.home=/opt/solr/server/solr
> -Duser.timezone=UTC-DzkClientTimeout=15000
> -DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181,
> triton.querymasters.com:2181,oberon.querymasters.com:2181,
> portia.querymasters.com:2181,puck.querymasters.com:2181/solr5
>
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseLargePages
> -XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:CMSTriggerPermRatio=80
> -XX:ConcGCThreads=8
> -XX:MaxDirectMemorySize=26g
> -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs
> -XX:ParallelGCThreads=8
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> -Xloggc:/opt/solr/server/logs/solr_gc.log
> -Xms8g
> -Xmx16g
> -Xss256k
> -verbose:gc
> --------------------
>
> The directoryFactory is configured as follows:
>
> <directoryFactory name="DirectoryFactory"
>          class="solr.HdfsDirectoryFactory">
>          <bool name="solr.hdfs.blockcache.enabled">true</bool>
>          <int name="solr.hdfs.blockcache.slab.count">200</int>
>          <bool
> name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
>          <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
>          <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
>          <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
>          <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
>          <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">64</int>
>          <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">512</int>
>          <str name="solr.hdfs.home">hdfs://nameservice1:8020/solr5</str>
>          <str
> name="solr.hdfs.confdir">/etc/hadoop/conf.cloudera.hdfs1</str>
>      </directoryFactory>
>
> -Joe
>
> On 6/5/2015 9:34 AM, Shawn Heisey wrote:
> > On 6/3/2015 6:39 PM, Joseph Obernberger wrote:
> >> Hi All - I've run into a problem where every-once in a while one or more
> >> of the shards (27 shard cluster) will loose connection to zookeeper and
> >> report "updates are disabled".  In additional to the CLUSTERSTATUS
> >> timeout errors, which don't seem to cause any issue, this one certainly
> >> does as that shard no longer takes any (you guessed it!) updates!
> >> We are using Zookeeper with 7 nodes (7 servers in our quorum).
> >> There stack trace is:
> > Other messages you have sent talk about Solr 5.x, and one of them
> > mentions a 16-node cluster with a 2.9 terabyte index, with the index
> > data stored on HDFS.
> >
> > I'm going to venture a guess that you don't have anywhere near enough
> > RAM for proper disk caching, leading to general performance issues,
> > which ultimately cause timeouts.  With HDFS, I'm not sure whether OS
> > disk cache on the Solr server matters very much, or whether that needs
> > to be on the HDFS servers.  I would guess the latter.  Also, if your
> > storage networking is gigabit or slower, HDFS may have significantly
> > more latency than local storage.  For good network storage speed, you
> > want 10gig ethernet or Infiniband.
> >
> > If it's Solr 5.x and you are using the included startup scripts, then
> > long GC pauses are probably not a major issue.  The startup scripts
> > include significant GC tuning. If you have deployed in your own
> > container, GC tuning might be an issue -- it is definitely required.
> >
> > Here is where I have written down everything I've learned about Solr
> > performance problems, most of which are due to one problem or another
> > with memory:
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Is your zookeeper database on local storage or HDFS?  I would suggest
> > keeping that on local storage for optimal performance.
> >
> > Thanks,
> > Shawn
> >
> >
>
>

Reply via email to