Hi Tom, 32MB is very low, 320MB is medium, and I think you could go higher, just pick whichever garbage collector is good for throughput. I know Java 1.6 update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. Finally, this sounds like a good use case for reindexing with Hadoop!
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: "Burton-West, Tom" <tburt...@umich.edu> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Wed, February 17, 2010 5:16:26 PM > Subject: What is largest reasonable setting for ramBufferSizeMB? > > Hello all, > > At some point we will need to re-build an index that totals about 2 > terrabytes > in size (split over 10 shards). At our current indexing speed we estimate > that > this will take about 3 weeks. We would like to reduce that time. It appears > that our main bottleneck is disk I/O. > We currently have ramBufferSizeMB set to 32 and our merge factor is 10. If > we > increase ramBufferSizeMB to 320, we avoid a merge and the 9 disk writes and > reads to merge 9+1 32MB segments into a 320MB segment. > > Assuming we allocate enough memory to the JVM, would it make sense to > increase > ramBufferSize to 3200MB? What are people's experiences with very large > ramBufferSizeMB sizes? > > Tom Burton-West > University of Michigan Library > www.hathitrust.org