On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen <t...@statsbiblioteket.dk>wrote:

> John Nielsen [j...@mcb.dk]:
> > I never seriously looked at my fieldValueCache. It never seemed to get
> used:
> > http://screencast.com/t/YtKw7UQfU
> That was strange. As you are using a multi-valued field with the new
> setup, they should appear there. Can you find the facet fields in any of
> the other caches?
> ...I hope you are not calling the facets with facet.method=enum? Could you
> paste a typical facet-enabled search request?
> > Yep. We still do a lot of sorting on dynamic field names, so the field
> cache
> > has a lot of entries. (9.411 entries as we speak. This is considerably
> lower
> > than before.). You mentioned in an earlier mail that faceting on a field
> > shared between all facet queries would bring down the memory needed.
> > Does the same thing go for sorting?
> More or less. Sorting stores the raw string representations (utf-8) in
> memory so the number of unique values has more to say than it does for
> faceting. Just as with faceting, a list of pointers from documents to
> values (1 value/document as we are sorting) is maintained, so the overhead
> is something like
> #documents*log2(#unique_terms*average_term_length) +
> #unique_terms*average_term_length
> (where average_term_length is in bits)
> Caveat: This is with the index-wide sorting structure. I am fairly
> confident that this is what Solr uses, but I have not looked at it lately
> so it is possible that some memory-saving segment-based trickery has been
> implemented.
> > Does those 9411 entries duplicate data between them?
> Sorry, I do not know. SOLR-1111 discusses the problems with the field
> cache and duplication of data, but I cannot infer if it is has been solved
> or not. I am not familiar with the stat breakdown of the fieldCache, but it
> _seems_ to me that there are 2 or 3 entries for each segment for each sort
> field. Guesstimating further, let's say you have 30 segments in your index.
> Going with the guesswork, that would bring the number of sort fields to
> 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?
> Extrapolating from 1.4M documents and 180 clients, let's say that there
> are 1.4M/180/5 unique terms for each sort-field and that their average
> length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.
> With this few unique values, the doc->value structure is by far the
> biggest, just as with facets. As opposed to the faceting structure, this is
> fairly close to the actual memory usage. Switching to a single sort field
> would reduce the memory usage from 4GB to about 55MB.
> > I do commit a bit more often than i should. I get these in my log file
> from
> > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> So 1 active searcher and 2 warming searchers. Ignoring that one of the
> warming searchers is highly likely to finish well ahead of the other one,
> that means that your heap must hold 3 times the structures for a single
> searcher. With the old heap size of 25GB that left "only" 8GB for a full
> dataset. Subtract the 4GB for sorting and a similar amount for faceting and
> you have your OOM.
> Tweaking your ingest to avoid 3 overlapping searchers will lower your
> memory requirements by 1/3. Fixing the facet & sorting logic will bring it
> down to laptop size.
> > The control panel says that the warm up time of the last searcher is
> 5574. Is that seconds or milliseconds?
> > http://screencast.com/t/d9oIbGLCFQwl
> milliseconds, I am fairly sure. It is much faster than I anticipated. Are
> you warming all the sort- and facet-fields?
> > Waiting for a full GC would take a long time.
> Until you have fixed the core memory issue, you might consider doing an
> explicit GC every night to clean up and hope that it does not occur
> automatically at daytime (or whenever your clients uses it).
> > Unfortunately I don't know of a way to provoke a full GC on command.
> VisualVM, which is delivered with the Oracle JDK (look somewhere in the
> bin folder), is your friend. Just start it on the server and click on the
> relevant process.
> Regards,
> Toke Eskildsen

