Just to confirm I'm not doing something insane, this is my general setup: - index approx 1MM documents including HTML, pictures, office files, etc. - files are not local to solr process - use upload/extract to extract text from them through tika - use commit=1 on each POST (reasons below) - use optimize=1 every 150 documents or so (reasons below)
Through many manual restarts and modifications to the upload script, I've got about half way (numDocs : 467372, disk usage 1.6G). The biggest problem is that any serious problem cannot be recovered from without a restart to tomcat, and serious problems can't be differentiated at the client level from non-serious problems (eg tika exceptions thrown by bad documents). On Wed, Jun 9, 2010 at 10:13 AM, Jim Blomo <jim.bl...@pbworks.com> wrote: > In any case I bumped up the heap to 3G as suggested, which has helped > stability. I have found that in practice I need to commit every > extraction because a crash or error will wipe out all extractions > after the last commit. I've also found that I need to optimize very regularly because I kept getting "too many file handles" errors (though they usually came up as the more cryptic "directory, but cannot be listed: list() returned null" returned empty error). What I am running into now is SEVERE: Exception invoking periodic operation: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1940) [full backtrace below] After a restart and optimize this goes away for a while (~100 documents) but then comes back and every request after the error fails. Even if I can't prevent this error, is there a way I can recover from it better? Perhaps an option to solr or tomcat to just restart itself if it hits that error? Jim SEVERE: Exception invoking periodic operation: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1940) at java.lang.String.substring(String.java:1905) at java.io.File.getName(File.java:401) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229) at java.io.File.isDirectory(File.java:754) at org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1000) at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1214) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559) at java.lang.Thread.run(Thread.java:619) Jul 3, 2010 1:32:20 AM org.apache.solr.update.processor.LogUpdateProcessor finish