Doninique:

Actually, the memory requirements shouldn't really go up as the number
of hits increases. The general algorithm is (say rows=10)
Calcluate the score of each doc
If the score is zero, ignore
If the score is > the minimum in my current top 10, replace the lowest
scoring doc in my current top 10 with the new doc (a PriorityQueue
last I knew).
else discard the doc.

When all docs have been scored, assemble the return from the top 10
(or whatever rows is set to).

The key here is that most of the Solr index is kept in
MMapDirecotry/OS space, see Uwe's excellent blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
In terms of _searching_, very little of the Lucene index structures
are kept in memory.

That said, faceting plays a bit loose with the rules. If you have
docValues set to true, most of the memory structures are in the OS
memory space, not the JVM. If you have docValues set to false, then
the "uninverted" structure is built in the JVM heap space.

Additionally, the JVM requirements are sensitive to the number of
unique values in field being faceted on. For instance, let's say you
faceted by a date field with just facet.field=some_date_field. A
bucket would have to be allocated to hold the counts for each and
every unique date field, i.e. one for each millisecond in your search,
which might be something you're seeing. Conceptually this is just an
array[uniqueValues] of ints (longs? I'm not sure). This should be
relatively easily testable by omitting the facets while measuring.

Where the number of rows _does_ make a difference is in the return
packet. Say I have rows=10. In that case I create a single return
packet with all 10 docs "fl" field. If rows = 10,000 then that return
packet is obviously 1,000 times as large and must be assembled in
memory.

I rather doubt the phonetic filter is to blame. But you can test this
by just omitting the field containing the phonetic filter in the
search query. I've certainly been wrong before.....

Best,
Erick

On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
<dominique.bej...@eolya.fr> wrote:
> Hi,
>
>
> Thank you both for your responses.
>
>
> I just have solr log for the very last period of the CG log.
>
>
> Grep command allows me to count queries per minute with hits > 1000 or >
> 10000 and so with the biggest impact on memory and cpu during faceting
>
>
>> 1000
>
>      59 11:13
>
>      45 11:14
>
>      36 11:15
>
>      45 11:16
>
>      59 11:17
>
>      40 11:18
>
>      95 11:19
>
>     123 11:20
>
>     137 11:21
>
>     123 11:22
>
>      86 11:23
>
>      26 11:24
>
>      19 11:25
>
>      17 11:26
>
>
>> 10000
>
>      55 11:19
>
>      78 11:20
>
>      48 11:21
>
>     134 11:22
>
>      93 11:23
>
>      10 11:24
>
>
> So we see that at the time GC start become nuts, large result set count
> increase.
>
>
> The query field include phonetic filter and results are really not relevant
> due to this. I will suggest to :
>
>
> 1/ remove the phonetic filter in order to have less irrelevant results and
> so get smaller result set
>
> 2/ enable docValues on field use for faceting
>
>
> I expect decrease GC requirements and stabilize GC.
>
>
> Regards
>
>
> Dominique
>
>
>
>
>
> Le ven. 1 déc. 2017 à 18:17, Erick Erickson <erickerick...@gmail.com> a
> écrit :
>
>> Your autowarm counts are rather high, bit as Toke says this doesn't
>> seem outrageous.
>>
>> I have seen situations where Solr is running close to the limits of
>> its heap and GC only reclaims a tiny bit of memory each time, when you
>> say "full GC with no memory
>> reclaimed" is that really no memory _at all_? Or "almost no memory"?
>> This situation can be alleviated by allocating more memory to the JVM
>> .
>>
>> Your JVM pressure would certainly be reduced by enabling docValues on
>> any field you sort,facet or group on. That would require a full
>> reindex of course. Note that this makes your index on disk bigger, but
>> reduces JVM pressure by roughly the same amount so it's a win in this
>> situation.
>>
>> Have you attached a memory profiler to the running Solr instance? I'd
>> be curious where the memory is being allocated.
>>
>> Best,
>> Erick
>>
>> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <t...@kb.dk> wrote:
>> > Dominique Bejean <dominique.bej...@eolya.fr> wrote:
>> >> We are encountering issue with GC.
>> >
>> >> Randomly nearly once a day there are consecutive full GC with no memory
>> >> reclaimed.
>> >
>> > [... 1.2M docs, Xmx 6GB ...]
>> >
>> >> Gceasy suggest to increase heap size, but I do not agree
>> >
>> > It does seem strange, with your apparently modest index & workload.
>> Nothing you say sounds problematic to me and you have covered the usual
>> culprits overlapping searchers, faceting and filterCache.
>> >
>> > Is it possible for you to share the solr.log around the two times that
>> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
>> >
>> > If you cannot share, please check if you have excessive traffic around
>> that time or if there is a lot of UnInverting going on (triggered by
>> faceting on non.DocValues String fields). I know your post implies that you
>> have already done so, so this is more of a sanity check.
>> >
>> >
>> > - Toke Eskildsen
>>
> --
> Dominique Béjean
> 06 08 46 12 43

Reply via email to