Do you do a lot of deletes (or 'updates' of existing documents)? Do you store lots of large fields? Maybe you can use compressed fields in that case (we never have tried it so I cannot confirm how well it works or performs).
You can also turn off things like norms and vectors, etc. if you aren't already to make index a bit smaller. Most likely having larger disks is your best option IMO. On Nov 1, 2011, at 12:13 PM, Jason Biggin wrote: > Thanks Robert. > > We optimize less frequently than we used to. Down to twice a month from > once a day. > > Without optimizing the search speed stays the same, however the index size > increases to 70+ GB. > > Perhaps there is a different way to restrict disk usage. > > Thanks, > Jason > > Robert Stewart <bstewart...@gmail.com> wrote: > > > Optimization merges index to a single segment (one huge file), so entire > index will be copied on replication. So you really do need 2x disk in some > cases then. > > Do you really need to optimize? We have a pretty big total index (about 200 > million docs) and we never optimize. But we do have a sharded index so our > largest indexes are only around 10 million docs. We have merge factor of 2. > We run replication every minute. > > In our tests search performance was not very much better with optimization, > but that may be specific to our types of searches, etc. You may have > different results. > > Bob > > On Nov 1, 2011, at 12:46 AM, Jason Biggin wrote: > >> Wondering if anyone has experience with replicating large indexes. We have >> a Solr deployment with 1 master, 1 master/slave and 5 slaves. Our index >> contains 15+ million articles and is ~55GB in size. >> >> Performance is great on all systems. >> >> Debian Linux >> Apache-Tomcat >> 100GB disk >> 6GB RAM >> 2 proc >> >> on VMWare ESXi 4.0 >> >> >> We notice however that whenever the master is optimized, the complete index >> is replicated to the slaves. This causes a 100%+ bloat in disk requirements. >> >> Is this normal? Is there a way around this? >> >> Currently our optimize is configured as such: >> >> curl >> 'http://localhost:8080/solr/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true' >> >> Willing to share our experiences with Solr. >> >> Thanks, >> Jason >