Doss: Tomcat often puts things in "catalina.out", you might check there, I've often seen logging information from Solr go there by default.
Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. > There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. > OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. > If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss <itsmed...@gmail.com> wrote: > Dear Erick, > > Forgive my ignorance. > > Please find some of the details you required. > > *have you looked at the solr logs?* > > > Sorry I haven't defined the log4j.properties file, so I don't have solr > logs. Since it requires tomcat restart I am planning to do it in next > restart. > > But found the following in tomcat log > > 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] > org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web > application [/mima] appears to have started a thread named > [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to > stop it. This is very likely to create a memory leak. Stack trace of thread: > sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > > > *How big are the cores?* > >> We have 16 cores, out of it only 5 are big ones. Total size of all 16 > cores is 10+ GB > > *How many docs in the cores when the problem happens?* > > 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) > 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) > remaining cores are 1,00,000 to 40,00,000 documents > > *How much memory are you allocating the JVM? * > > 5GB for JVM, Total RAM available in the systems is 30 GB > > *can you restart Tomcat without a problem?* > > This problem is occurring in production, I never tried. > > > Thanks, > Doss. > > > On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> You've really got to provide details for us to say much >> of anything. There are about a zillion things that it could be. >> >> In particular, have you looked at the solr logs? Are there >> any interesting things in them? How big are the cores? >> How much memory are you allocating the JVM? How >> many docs in the cores when the problem happens? >> Before the nodes stop responding, can you restart >> Tomcat without a problem? >> >> You might review: >> http://wiki.apache.org/solr/UsingMailingLists >> >> Best, >> Erick >> >> >> On Wed, Nov 19, 2014 at 1:04 AM, Doss <itsmed...@gmail.com> wrote: >> > I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times >> > SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat >> in >> > Node 1, but SOLR not starting up, but if I remove the solr cores in both >> > nodes and try restarting it starts working, and then I have to reindex >> the >> > whole data again. We are using this setup in production because of this >> > issue we are having 1 to 1.30 hours of service down time. Any suggestions >> > would be greatly appreciated. >> > >> > Thanks, >> > Doss. >>