Here is the situation where merging can require 3X space. It can only happen if 
you force merge, then index with merging turned off, but we had Ultraseek 
customers do that.

* All documents are merged into a single segment.
* Without a merge, all documents are replaced.
* This results in one segment of deleted documents and one of new documents 
(2X).
* A merge takes place, creating a new segment of the same size, thus 3X.

For normal operation, 2X is plenty of room.

wunder

On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

> I've investigated this in the past. The worst case is 2*indexSize additional 
> disk space (3*indexSize total) during an optimize.
> 
> In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor 
> of 10. We would see the worst case happen when there were exactly 20 segments 
> (or some other multiple of 10, I believe) at the start of the optimize. IIRC, 
> it would merge those 20 segments down to 2 segments, and then merge those 2 
> segments down to 1 segment. 1*indexSize space was used by the original index 
> (because there is still a reader open on it), 1*indexSpace was used by the 2 
> segments, and 1*indexSize space was used by the 1 segment. This is the worst 
> case because there are two full additional copies of the index on disk. 
> Normally, when the number of segments is not a multiple of the mergeFactor, 
> there will be some part of the index that was not part of both merges (and 
> this part that is excluded usually would be the largest segments).
> 
> We worked around this by doing multiple optimize passes, where the first pass 
> merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip 
> from Lance Norskog on the mailing list a couple years ago).
> 
> I'm not sure if the current merge policy implementations still have this 
> issue.
> 
> -Michael
> 
> -----Original Message-----
> From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
> Sent: Thursday, April 11, 2013 2:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine?
> 
> Hi Walter;
> 
> Is there any document or something else says that worst case is three times 
> of disk space? Twice times or three times. It is really different when we 
> talk about GB's of disk spaces.
> 
> 
> 2013/4/10 Walter Underwood <wun...@wunderwood.org>
> 
>> Correct, except the worst case maximum for disk space is three times.
>> --wunder
>> 
>> On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
>> 
>>> You're mixing up disk and RAM requirements when you talk about 
>>> having twice the disk size. Solr does _NOT_ require twice the index 
>>> size of RAM to optimize, it requires twice the size on _DISK_.
>>> 
>>> In terms of RAM requirements, you need to create an index, run 
>>> realistic queries at the installation and measure.
>>> 
>>> Best
>>> Erick
>>> 
>>> On Tue, Apr 9, 2013 at 10:32 PM, bigjust <bigj...@lambdaphil.es> wrote:
>>>> 
>>>> 
>>>> 
>>>>>> On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
>>>>>>> These are really good metrics for me:
>>>>>>> You say that RAM size should be at least index size, and it is 
>>>>>>> better to have a RAM size twice the index size (because of worst 
>>>>>>> case scenario).
>>>>>>> On the other hand let's assume that I have a RAM size that is 
>>>>>>> bigger than twice of indexes at machine. Can Solr use that extra 
>>>>>>> RAM or is it a approximately maximum limit (to have twice size 
>>>>>>> of indexes at machine)?
>>>>>> What we have been discussing is the OS cache, which is memory 
>>>>>> that is not used by programs.  The OS uses that memory to make 
>>>>>> everything run faster.  The OS will instantly give that memory up 
>>>>>> if a program requests it.
>>>>>> Solr is a java program, and java uses memory a little 
>>>>>> differently, so Solr most likely will NOT use more memory when it is 
>>>>>> available.
>>>>>> In a "normal" directly executable program, memory can be 
>>>>>> allocated at any time, and given back to the system at any time.
>>>>>> With Java, you tell it the maximum amount of memory the program 
>>>>>> is ever allowed to use.  Because of how memory is used inside 
>>>>>> Java, most long-running Java programs (like Solr) will allocate 
>>>>>> up to the configured maximum even if they don't really need that much 
>>>>>> memory.
>>>>>> Most Java virtual machines will never give the memory back to the 
>>>>>> system even if it is not required.
>>>>>> Thanks, Shawn
>>>>>> 
>>>>>> 
>>>> Furkan KAMACI <furkankam...@gmail.com> writes:
>>>> 
>>>>> I am sorry but you said:
>>>>> 
>>>>> *you need enough free RAM for the OS to cache the maximum amount 
>>>>> of disk space all your indexes will ever use*
>>>>> 
>>>>> I have made an assumption my indexes at my machine. Let's assume 
>>>>> that it is 5 GB. So it is better to have at least 5 GB RAM? OK, 
>>>>> Solr will use RAM up to how much I define it as a Java processes. 
>>>>> When we think about the indexes at storage and caching them at RAM 
>>>>> by OS, is that what you talk about: having more than 5 GB - or - 
>>>>> 10 GB RAM for my machine?
>>>>> 
>>>>> 2013/4/10 Shawn Heisey <s...@elyograg.org>
>>>>> 
>>>> 
>>>> 10 GB.  Because when Solr shuffles the data around, it could use up 
>>>> to twice the size of the index in order to optimize the index on disk.
>>>> 
>>>> -- Justin
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to