[ 
https://issues.apache.org/jira/browse/SOLR-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Russakovskii updated SOLR-1482:
-------------------------------------

    Attachment: catalina2.out
                catalina.out

One thing I forgot to mention - when the hang occurs on the slave, 1 out of 8 
CPUs on the machine starts using 100%, which might point in a direction of a 
bug rather than a Java memory issue. Remember - the slave never throws those 
Java errors to the log, only the master does. The slave just hangs. Using htop, 
I can see one of the children java processes use that 100% CPU.

Bill, the appserver log is catalina.out, right? In any case, I'm tailing every 
file in the tomcat log dir and that's the log I've been pasting and talking 
about.

I've attached 2 full thread dumps after kill -3 (it's quite verbose) on both 
slaves (both slaves are affected now).

The first one catalina.out is from the slave that had the Perm limit raised to 
512MB, the 2nd one catalina2.out is from the server without any changes to Perm 
limits.

> Solr master and slave freeze after query
> ----------------------------------------
>
>                 Key: SOLR-1482
>                 URL: https://issues.apache.org/jira/browse/SOLR-1482
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Nightly 9/28/09.
> 14 individual instances per server, using JNDI.
> replicateAfter commit, 5 min interval polling.
> All caches are currently commented out, on both slave and master.
> Lots of ongoing commits - large chunks of data, each accompanied by a commit. 
> This is to guarantee that anything we think is now in Solr remains there in 
> case the server crashes.
>            Reporter: Artem Russakovskii
>            Priority: Critical
>         Attachments: catalina.out, catalina2.out
>
>
> We're having issues with the deployment of 2 master-slave setups.
> One of the master-slave setups is OK (so far) but on the other both the 
> master and the slave keep freezing, but only after I send a query to them. 
> And by freezing I mean indefinite hanging, with almost no output to log, no 
> errors, nothing. It's as if there's some sort of a deadlock. The hanging 
> servers need to be killed with -9, otherwise they keep hanging.
> The query I send queries all instances at the same time using the ?shards= 
> syntax.
> On the slave, the logs just stop - nothing shows up anymore after the query 
> is issued. On the master, they're a bit more descriptive. This information 
> seeps through very-very slowly, as you can see from the timestamps:
> {quote}
> SEVERE: java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:16:00 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:19:37 PM org.apache.catalina.connector.CoyoteAdapter service
> SEVERE: An exception or error occurred in the container during the request 
> processing
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:19:37 PM org.apache.coyote.http11.Http11Processor process
> SEVERE: Error processing request
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:19:39 PM org.apache.catalina.connector.CoyoteAdapter service
> SEVERE: An exception or error occurred in the container during the request 
> processing
> java.lang.OutOfMemoryError: PermGen space
> Exception in thread "ContainerBackException in thread "pool-29-threadOct 1, 
> 2009 2:21:47 PM org.apache.catalina.connector.CoyoteAdapter service
> SEVERE: An exception or error occurred in the container during the request 
> processing
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> -22" java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> Exception in thread "http-8080-42" Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> Oct 1, 2009 2:21:47 PM 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
> SEVERE: Error reading request, ignored
> java.lang.OutOfMemoryError: PermGen space
> Exception in thread "http-8080-26" Exception in thread "http-8080-32" 
> Exception in thread "http-8080-25" Exception in thread "http-8080-22" 
> Exception in thread "http-8080-15" Exception in thread "http-8080-45" 
> Exception in thread "http-8080-13" Exception in thread "http-8080-48" 
> Exception in thread "http-8080-7" Exception in thread "http-8080-38" 
> Exception in thread "http-8080-39" Exception in thread "http-8080-28" 
> Exception in thread "http-8080-1" Exception in thread "http-8080-2" Exception 
> in thread "http-8080-12" Exception in thread "http-8080-44" Exception in 
> thread "http-8080-47" Exception in thread "http-8080-29" Exception in thread 
> "http-8080-33" Exception in thread "http-8080-27" Exception in thread 
> "http-8080-36" Exception in thread "http-8080-113" Exception in thread 
> "http-8080-112" Exception in thread "http-8080-37" Exception in thread 
> "http-8080-18" java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
> Exception in thread "http-8080-34" java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
> Exception in thread "http-8080-103"
> {quote}
> So the problem seems to be related to PermGen space. I found 
> http://www.nabble.com/Number-of-webapps-td22198080.html and tried 
> -XX:MaxPermSize=256m, but it didn't fix the problem. The current 
> CATALINA_OPTS looks like this:
> {quote}
> export CATALINA_OPTS="-XX:MaxPermSize=256m -Xmx6500m -XX:+UseConcMarkSweepGC"
> {quote}
> Is the only solution at this point going multicore, as Noble suggested (is 
> Noble your first name? I always assumed it was Paul and Noble was part of the 
> nickname)? Will multicore get rid of the problem, before we spend time 
> looking at it? For multicore, will the existing data dirs be compatible or 
> would a complete reindex be needed? 
> I'm willing to provide any information to you guys, just not sure what at the 
> moment. I'm also open to communicating outside of JIRA, at artem [_aT_] plaxo 
> {dot} com.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to