Thanks Walter, you guys gave me really nice ideas about RAM approximation.

2013/4/11 Walter Underwood <wun...@wunderwood.org>

> Here is the situation where merging can require 3X space. It can only
> happen if you force merge, then index with merging turned off, but we had
> Ultraseek customers do that.
>
> * All documents are merged into a single segment.
> * Without a merge, all documents are replaced.
> * This results in one segment of deleted documents and one of new
> documents (2X).
> * A merge takes place, creating a new segment of the same size, thus 3X.
>
> For normal operation, 2X is plenty of room.
>
> wunder
>
> On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:
>
> > I've investigated this in the past. The worst case is 2*indexSize
> additional disk space (3*indexSize total) during an optimize.
> >
> > In our system, we use LogByteSizeMergePolicy, and used to have a
> mergeFactor of 10. We would see the worst case happen when there were
> exactly 20 segments (or some other multiple of 10, I believe) at the start
> of the optimize. IIRC, it would merge those 20 segments down to 2 segments,
> and then merge those 2 segments down to 1 segment. 1*indexSize space was
> used by the original index (because there is still a reader open on it),
> 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by
> the 1 segment. This is the worst case because there are two full additional
> copies of the index on disk. Normally, when the number of segments is not a
> multiple of the mergeFactor, there will be some part of the index that was
> not part of both merges (and this part that is excluded usually would be
> the largest segments).
> >
> > We worked around this by doing multiple optimize passes, where the first
> pass merges down to between 2 and 2*mergeFactor-1 segments (based on a
> great tip from Lance Norskog on the mailing list a couple years ago).
> >
> > I'm not sure if the current merge policy implementations still have this
> issue.
> >
> > -Michael
> >
> > -----Original Message-----
> > From: Furkan KAMACI [mailto:furkankam...@gmail.com]
> > Sent: Thursday, April 11, 2013 2:44 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Approximately needed RAM for 5000 query/second at a Solr
> machine?
> >
> > Hi Walter;
> >
> > Is there any document or something else says that worst case is three
> times of disk space? Twice times or three times. It is really different
> when we talk about GB's of disk spaces.
> >
> >
> > 2013/4/10 Walter Underwood <wun...@wunderwood.org>
> >
> >> Correct, except the worst case maximum for disk space is three times.
> >> --wunder
> >>
> >> On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
> >>
> >>> You're mixing up disk and RAM requirements when you talk about
> >>> having twice the disk size. Solr does _NOT_ require twice the index
> >>> size of RAM to optimize, it requires twice the size on _DISK_.
> >>>
> >>> In terms of RAM requirements, you need to create an index, run
> >>> realistic queries at the installation and measure.
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Tue, Apr 9, 2013 at 10:32 PM, bigjust <bigj...@lambdaphil.es>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>>>> On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
> >>>>>>> These are really good metrics for me:
> >>>>>>> You say that RAM size should be at least index size, and it is
> >>>>>>> better to have a RAM size twice the index size (because of worst
> >>>>>>> case scenario).
> >>>>>>> On the other hand let's assume that I have a RAM size that is
> >>>>>>> bigger than twice of indexes at machine. Can Solr use that extra
> >>>>>>> RAM or is it a approximately maximum limit (to have twice size
> >>>>>>> of indexes at machine)?
> >>>>>> What we have been discussing is the OS cache, which is memory
> >>>>>> that is not used by programs.  The OS uses that memory to make
> >>>>>> everything run faster.  The OS will instantly give that memory up
> >>>>>> if a program requests it.
> >>>>>> Solr is a java program, and java uses memory a little
> >>>>>> differently, so Solr most likely will NOT use more memory when it
> is available.
> >>>>>> In a "normal" directly executable program, memory can be
> >>>>>> allocated at any time, and given back to the system at any time.
> >>>>>> With Java, you tell it the maximum amount of memory the program
> >>>>>> is ever allowed to use.  Because of how memory is used inside
> >>>>>> Java, most long-running Java programs (like Solr) will allocate
> >>>>>> up to the configured maximum even if they don't really need that
> much memory.
> >>>>>> Most Java virtual machines will never give the memory back to the
> >>>>>> system even if it is not required.
> >>>>>> Thanks, Shawn
> >>>>>>
> >>>>>>
> >>>> Furkan KAMACI <furkankam...@gmail.com> writes:
> >>>>
> >>>>> I am sorry but you said:
> >>>>>
> >>>>> *you need enough free RAM for the OS to cache the maximum amount
> >>>>> of disk space all your indexes will ever use*
> >>>>>
> >>>>> I have made an assumption my indexes at my machine. Let's assume
> >>>>> that it is 5 GB. So it is better to have at least 5 GB RAM? OK,
> >>>>> Solr will use RAM up to how much I define it as a Java processes.
> >>>>> When we think about the indexes at storage and caching them at RAM
> >>>>> by OS, is that what you talk about: having more than 5 GB - or -
> >>>>> 10 GB RAM for my machine?
> >>>>>
> >>>>> 2013/4/10 Shawn Heisey <s...@elyograg.org>
> >>>>>
> >>>>
> >>>> 10 GB.  Because when Solr shuffles the data around, it could use up
> >>>> to twice the size of the index in order to optimize the index on disk.
> >>>>
> >>>> -- Justin
> >>
> >> --
> >> Walter Underwood
> >> wun...@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Reply via email to