Just to confirm I'm not doing something insane, this is my general setup:

- index approx 1MM documents including HTML, pictures, office files, etc.
- files are not local to solr process
- use upload/extract to extract text from them through tika
- use commit=1 on each POST (reasons below)
- use optimize=1 every 150 documents or so (reasons below)

Through many manual restarts and modifications to the upload script,
I've got about half way (numDocs : 467372, disk usage 1.6G).  The
biggest problem is that any serious problem cannot be recovered from
without a restart to tomcat, and serious problems can't be
differentiated at the client level from non-serious problems (eg tika
exceptions thrown by bad documents).

On Wed, Jun 9, 2010 at 10:13 AM, Jim Blomo <jim.bl...@pbworks.com> wrote:
> In any case I bumped up the heap to 3G as suggested, which has helped
> stability.  I have found that in practice I need to commit every
> extraction because a crash or error will wipe out all extractions
> after the last commit.

I've also found that I need to optimize very regularly because I kept
getting "too many file handles" errors (though they usually came up as
the more cryptic "directory, but cannot be listed: list() returned
null" returned empty error).

What I am running into now is

SEVERE: Exception invoking periodic operation:
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.String.substring(String.java:1940)
[full backtrace below]

After a restart and optimize this goes away for a while (~100
documents) but then comes back and every request after the error
fails.  Even if I can't prevent this error, is there a way I can
recover from it better?  Perhaps an option to solr or tomcat to just
restart itself if it hits that error?

Jim

SEVERE: Exception invoking periodic operation:
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.String.substring(String.java:1940)
        at java.lang.String.substring(String.java:1905)
        at java.io.File.getName(File.java:401)
        at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229)
        at java.io.File.isDirectory(File.java:754)
        at 
org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1000)
        at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1214)
        at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293)
        at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
        at 
org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306)
        at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570)
        at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579)
        at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559)
        at java.lang.Thread.run(Thread.java:619)
Jul 3, 2010 1:32:20 AM
org.apache.solr.update.processor.LogUpdateProcessor finish

Reply via email to