Hi everyone,
We have Windows 2022 system running Solr 9.1.1 that everyday to stop at fixed
time 19:00 to backup, backup normally cost 10minutes. then at 20:00, start the
application with solr.
solr installed with nssm.exe run as windows service. Start solr batch file use
command "net start solr-svc".
Solr run on sngle machine and use cloud mode, it has "EMBEDDED STANDALONE
ZOOKEEPER SERVER at port 9983". For every 3-4 days, Solr start failed due to
error "o.a.s.c.SolrCore null => org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
localhost:9983 within 30000 ms", when this error appears, about 10 minutes
later, if system admin run same command "net start solr-svc", solr will start
properly most of time. (if still failed, then , just wait 10mins and try to
start solr service again.)
We checked:
[1] System has 16G physical memory, the object in Solr is around 10000, when
the above issue happen, system has about 8G free memory, Solr has enough memory
[2] free disk space is 850G
What is possible causes and countermeasures?
I'm appreciate any though/suggestion might have about this.
2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle STARTED
@6108ms ScheduledExecutorScheduler@15f35bc3{STARTED}
2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting
ClientSelectorManager@16a5eb6d{STOPPED}
2024-11-17 20:00:07.696 DEBUG (main) [] o.e.j.u.c.ContainerLifeCycle
EatWhatYouKill@31120021/SelectorProducer@2740e316/IDLE/p=false/NoTryExecutor@5b5a4aed[org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@138aa3cc[Running,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
0]][pc=0,pic=0,pec=0,epc=0]@2024-11-17T20:00:07.6961328Z added
{SelectorProducer@2740e316,POJO}
2024-11-17 20:00:07.847 INFO (main) [] o.a.s.c.SolrZkServerProps Reading
configuration from: D:\MyApplication\Solr\server\solr\zoo.cfg
2024-11-17 20:00:07.850 INFO (main) [] o.a.s.c.SolrZkServer STARTING EMBEDDED
STANDALONE ZOOKEEPER SERVER at port 9983
2024-11-17 20:00:07.850 WARN (main) [] o.a.s.c.SolrZkServer Embedded Zookeeper
is not recommended in production environments. See Reference Guide for details.
2024-11-17 20:00:08.350 INFO (main) [] o.a.s.c.ZkContainer Zookeeper
client=localhost:9983
2024-11-17 20:00:08.372 INFO (main) [] o.a.s.c.DistributedClusterStateUpdater
Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false.
Solr will be using Overseer based cluster state updates.
2024-11-17 20:00:08.379 DEBUG (main) [] o.a.s.c.c.ZkClientConnectionStrategy
Attempting to load zk connection strategy 'null'
2024-11-17 20:00:08.381 DEBUG (main) [] o.a.s.c.ZkController Added new
OnReconnect listener
org.apache.solr.cloud.ZkController$$Lambda$321/0x00000001004f2c40@5d512ddb
2024-11-17 20:00:25.983 WARN (embeddedZkServer) [] o.a.z.s.ServerCnxnFactory
maxCnxns is not configured, using default value 0.
2024-11-17 20:00:26.481 INFO (main) [] o.a.s.c.c.ConnectionManager Waiting up
to 30000ms for client to connect to ZooKeeper
2024-11-17 20:00:37.521 WARN (main-SendThread(localhost:9983)) []
o.a.z.ClientCnxn Session 0x0 for server localhost/0:0:0:0:0:0:0:1:9983, Closing
socket connection. Attempting reconnect except it is a SessionExpiredException.
=> java.net.ConnectException: Connection refused: no further information
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
~[?:?]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282)
~[?:?]
2024-11-17 20:00:59.084 DEBUG (main-EventThread) [] o.a.s.c.c.SolrZkClient
Submitting job to respond to event WatchedEvent state:Closed type:None path:null
2024-11-17 20:00:59.085 ERROR (main-EventThread) [] o.a.z.ClientCnxn Error
while calling watcher. => java.util.concurrent.RejectedExecutionException: Task
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68
rejected from
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
java.util.concurrent.RejectedExecutionException: Task
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68
rejected from
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
~[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:252)
~[?:?]
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
~[?:?]
at
org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:1019)
~[?:?]
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:578)
~[?:?]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553)
~[?:?]
2024-11-17 20:00:59.087 ERROR (main) [] o.a.s.s.CoreContainerProvider Could not
start Solr. Check solr/home property and the logs
2024-11-17 20:00:59.100 ERROR (main) [] o.a.s.c.SolrCore null =>
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException:
Could not connect to ZooKeeper localhost:9983 within 30000 ms
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225)
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException:
Could not connect to ZooKeeper localhost:9983 within 30000 ms
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225) ~[?:?]
...
at org.eclipse.jetty.start.Main.main(Main.java:77)
~[start.jar:9.4.48.v20220622] Caused by: java.util.concurrent.TimeoutException:
Could not connect to ZooKeeper localhost:9983 within 30000 ms
at
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:297)
~[?:?]
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:216) ~[?:?]
... 54 more
2024-11-17 20:00:59.102 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting
SolrRequestFilter==org.apache.solr.servlet.SolrDispatchFilter@162c1dfb{inst=false,async=false,src=DESCRIPTOR:file:///D:/MyApplication/Solr/server/solr-webapp/webapp/WEB-INF/web.xml}
2024-11-17 20:00:59.106 ERROR (main) [] o.a.s.c.SolrCore null =>
javax.servlet.UnavailableException: Error processing the request. CoreContainer
is either not initialized or shutting down.
at
org.apache.solr.servlet.CoreContainerProvider.waitForCoreContainer(CoreContainerProvider.java:150)
javax.servlet.UnavailableException: Error processing the request. CoreContainer
is either not initialized or shutting down.