Do you do a lot of deletes (or 'updates' of existing documents)?

Do you store lots of large fields? Maybe you can use compressed fields in that 
case (we never have tried it so I cannot confirm how well it works or performs).

You can also turn off things like norms and vectors, etc. if you aren't already 
to make index a bit smaller.

Most likely having larger disks is your best option IMO.




On Nov 1, 2011, at 12:13 PM, Jason Biggin wrote:

> Thanks Robert.
> 
> We optimize less frequently than we used to.   Down to twice a month from 
> once a day.
> 
> Without optimizing the search speed stays the same, however the index size 
> increases to 70+ GB.
> 
> Perhaps there is a different way to restrict disk usage.
> 
> Thanks,
> Jason
> 
> Robert Stewart <bstewart...@gmail.com> wrote:
> 
> 
> Optimization merges index to a single segment (one huge file), so entire 
> index will be copied on replication.  So you really do need 2x disk in some 
> cases then.
> 
> Do you really need to optimize?  We have a pretty big total index (about 200 
> million docs) and we never optimize.  But we do have a sharded index so our 
> largest indexes are only around 10 million docs.  We have merge factor of 2.  
> We run replication every minute.
> 
> In our tests search performance was not very much better with optimization, 
> but that may be specific to our types of searches, etc.  You may have 
> different results.
> 
> Bob
> 
> On Nov 1, 2011, at 12:46 AM, Jason Biggin wrote:
> 
>> Wondering if anyone has experience with replicating large indexes.  We have 
>> a Solr deployment with 1 master, 1 master/slave and 5 slaves.  Our index 
>> contains 15+ million articles and is ~55GB in size.
>> 
>> Performance is great on all systems.
>> 
>> Debian Linux
>> Apache-Tomcat
>> 100GB disk
>> 6GB RAM
>> 2 proc
>> 
>> on VMWare ESXi 4.0
>> 
>> 
>> We notice however that whenever the master is optimized, the complete index 
>> is replicated to the slaves.  This causes a 100%+ bloat in disk requirements.
>> 
>> Is this normal?  Is there a way around this?
>> 
>> Currently our optimize is configured as such:
>> 
>>      curl 
>> 'http://localhost:8080/solr/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true'
>> 
>> Willing to share our experiences with Solr.
>> 
>> Thanks,
>> Jason
> 

Reply via email to