Thanks Erik, Yes i see the 'sawtooth' pattern. I will try your suggestion,
but i am wondering why were the queries performant with solr4 without
DocValues? Have some defaults changed?

---



On Sat, Nov 11, 2017 at 8:28 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Nawab:
>
> bq: Cache hit ratios are all in 80+% (even when i decreased the
> filterCache to 128)
>
> This suggests that you use a relatively small handful of fq clauses,
> which is perfectly fine. Having 450M docs and a cache size of 1024 is
> _really_ scary! You had a potential for a 57G (yes, gigabyte)
> filterCache. Fortunately you apparently don't use enough different fq
> clauses to fill it up, or they match very few documents. I cheated a
> little, if the result set is small the individual doc IDs are stored
> rather than a bitset 450M bits wide.... Your
> admin>>core>>plugins/stats>>filterCache should show you how many
> evictions there are which is another interesting stat.
>
> As it is, you're filterCache might use up 7G or so. Hefty but you have
> lots of RAM.
>
> *************
> bq:  Document cache hitratio is really bad,
>
> This is often the case. Getting documents really means, here, getting
> the _stored_ values. The point of the documentCache is to keep entries
> in a cache for the various elements of a single request to use. To
> name just 2
> > you get the stored values for the "fl" list
> > you highlight.
>
> These are separate, and each accesses the stored values. Problem is,
> "accessing the stored values" means
> 1> reading the document from disk
> 2> decompressing a 16K block minimum.
>
> I'm skipping the fact that returning docValues doesn't need the stored
> data, but you get the idea.
>
> Anyway, not having to read/decompress for both the"fl" list and
> highlighting is what the documentCache is about. That's where the
> recommendation "size it as (max # of users) * (max rows)"
> recommendation comes in (if you can afford the memory certainly).
>
> Some users have situations where the documentCache hit ratio is much
> better, but I'd be surprised if any core with 450M docs even got
> close.
>
> *************
> bq: That supported the hypothesis that the query throughput decreases
> after opening a new searcher and **not** after committing the index
>
> Are you saying that you have something of a sawtooth pattern? I.e.
> queries are slow "for a while" after opening a new searcher but then
> improve until the next commit? This is usually an autowarm problem, so
> you might address it with a more precise autowarm. Look particularly
> for anything that sorts/groups/facets. Any such fields should have
> docValues=true set. Unfortunately this will require a complete
> re-index. Don't be frightened by the fact that enabling docValues will
> cause your index size on disk to grow. Paradoxically that will
> actually _lower_ the size of the JVM heap requirements. Essentially
> the additional size on disk is the serialized structure that would
> have to be built in the JVM. Since it is pre-built at index time, it
> can be MMapped and use OS memory space and not JVM.
>
> *************
> 450M docs and 800G index size is quite large and a prime candidate for
> sharding FWIW.
>
> Best,
> Erick
>
>
>
>
> On Sat, Nov 11, 2017 at 4:52 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > ~248 gb
> >
> > Nawab
> >
> >
> > On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <kris...@apache.org> wrote:
> >
> >> > One machine runs with a 3TB drive, running 3 solr processes (each with
> >> one core as described above).
> >>
> >> How much total memory on the machine?
> >>
> >> Kevin Risden
> >>
> >> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >>
> >> > Thanks for a quick and detailed response, Erick!
> >> >
> >> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
> >> > running really nicely with the above config. I had assumed that same
> or
> >> > similar settings will also perform well with Solr 7, but that
> assumption
> >> > didn't hold. As, a lot has changed in 3 major releases.
> >> > I have tweaked the cache values as you suggested but increasing or
> >> > decreasing doesn't seem to do any noticeable improvement.
> >> >
> >> > At the moment, my one core has 800GB index, ~450 Million documents,
> 48 G
> >> > Xmx. GC pauses haven't been an issue though.  One machine runs with a
> 3TB
> >> > drive, running 3 solr processes (each with one core as described
> >> above).  I
> >> > agree that it is a very atypical system so i should probably try
> >> different
> >> > parameters with a fresh eye to find the solution.
> >> >
> >> >
> >> > I tried with autocommits (commit with opensearcher=false very half
> >> minute ;
> >> > and softcommit every 5 minutes). That supported the hypothesis that
> the
> >> > query throughput decreases after opening a new searcher and **not**
> after
> >> > committing the index . Cache hit ratios are all in 80+% (even when i
> >> > decreased the filterCache to 128, so i will keep it at this lower
> value).
> >> > Document cache hitratio is really bad, it drops to around 40% after
> >> > newSearcher. But i guess that is expected, since it cannot be warmed
> up
> >> > anyway.
> >> >
> >> >
> >> > Thanks
> >> > Nawab
> >> >
> >> >
> >> >
> >> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> > > What evidence to you have that the changes you've made to your
> configs
> >> > > are useful? There's lots of things in here that are suspect:
> >> > >
> >> > >   <double name="forceMergeDeletesPctAllowed">1</double>
> >> > >
> >> > > First, this is useless unless you are forceMerging/optimizing. Which
> >> > > you shouldn't be doing under most circumstances. And you're going to
> >> > > be rewriting a lot of data every time See:
> >> > >
> >> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> >> > > documents-optimize-may-bad/
> >> > >
> >> > > filterCache size of size="10240" is far in excess of what we usually
> >> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of
> them.
> >> > > Why did you choose this? On the theory that "more is better?" If
> >> > > you're using NOW then you may not be using the filterCache well,
> see:
> >> > >
> >> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> >> > >
> >> > > autowarmCount="1024"
> >> > >
> >> > > Every time you commit you're firing off 1024 queries which is going
> to
> >> > > spike the CPU a lot. Again, this is super-excessive. I usually start
> >> > > with 16 or so.
> >> > >
> >> > > Why are you committing from a cron job? Why not just set your
> >> > > autocommit settings and forget about it? That's what they're for.
> >> > >
> >> > > Your queryResultCache is likewise kind of large, but it takes up
> much
> >> > > less space than the filterCache per entry so it's probably OK. I'd
> >> > > still shrink it and set the autowarm to 16 or so to start, unless
> >> > > you're seeing a pretty high hit ratio, which is pretty unusual but
> >> > > does happen.
> >> > >
> >> > > 48G of memory is just asking for long GC pauses. How many docs do
> you
> >> > > have in each core anyway? If you're really using this much heap,
> then
> >> > > it'd be good to see what you can do to shrink in. Enabling docValues
> >> > > for all fields you facet, sort or group on will help that a lot if
> you
> >> > > haven't already.
> >> > >
> >> > > How much memory on your entire machine? And how much is used by
> _all_
> >> > > the JVMs you running on a particular machine? MMapDirectory needs as
> >> > > much OS memory space as it can get, see:
> >> > >
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >> > >
> >> > > Lately we've seen some structures that consume memory until a commit
> >> > > happens (either soft or hard). I'd shrink my autocommit down to 60
> >> > > seconds or even less (openSearcher=false).
> >> > >
> >> > > In short, I'd go back mostly to the default settings and build _up_
> as
> >> > > you can demonstrate improvements. You've changed enough things here
> >> > > that untangling which one is the culprit will be hard. You want the
> >> > > JVM to have as little memory as possible, unfortunately that's
> >> > > something you figure out by experimentation.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <
> >> khi...@gmail.com>
> >> > > wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I am committing every 5 minutes using a periodic cron job  "curl
> >> > > > http://localhost:8984/solr/core1/update?commit=true";. Besides
> this,
> >> my
> >> > > app
> >> > > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am
> >> noticing
> >> > > > that query throughput plummets every 5 minutes - probably when the
> >> > commit
> >> > > > happens.
> >> > > > What can I do to improve this? I didn't use to happen like this in
> >> > > solr4.5.
> >> > > > (i.e., i used to get a stable query throughput of  50-60 queries
> per
> >> > > > second. Now there are spikes to 60 qps interleaved by drops to
> almost
> >> > > > **0**).  Between those 5 minutes, I am able to achieve high
> >> throughput,
> >> > > > hence I guess that issue is related to indexing or merging, and
> not
> >> > query
> >> > > > flow.
> >> > > >
> >> > > > I have 48G allotted to each solr process, and it seems that only
> ~50%
> >> > is
> >> > > > being used at any time, similarly CPU is not spiking beyond 50%
> >> either.
> >> > > > There is frequent merging (every 5 minute) , but i am not sure if
> >> that
> >> > is
> >> > > > a cause of the slowdown.
> >> > > >
> >> > > > Here are my merge and cache settings:
> >> > > >
> >> > > > Thanks
> >> > > > Nawab
> >> > > >
> >> > > > <mergePolicyFactory class="org.apache.solr.index.
> >> > > TieredMergePolicyFactory">
> >> > > >   <int name="maxMergeAtOnce">5</int>
> >> > > >   <int name="segmentsPerTier">5</int>
> >> > > >       <int name="maxMergeAtOnceExplicit">10</int>
> >> > > >       <int name="floorSegmentMB">16</int>
> >> > > >       <!-- 50 gb -->
> >> > > >       <double name="maxMergedSegmentMB">50000</double>
> >> > > >       <double name="forceMergeDeletesPctAllowed">1</double>
> >> > > >
> >> > > >     </mergePolicyFactory>
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > <filterCache class="solr.FastLRUCache"
> >> > > >              size="10240"
> >> > > >              initialSize="5120"
> >> > > >              autowarmCount="1024"/>
> >> > > > <queryResultCache class="solr.LRUCache"
> >> > > >                  size="10240"
> >> > > >                  initialSize="5120"
> >> > > >                  autowarmCount="0"/>
> >> > > > <documentCache class="solr.LRUCache"
> >> > > >                size="10240"
> >> > > >                initialSize="5120"
> >> > > >                autowarmCount="0"/>
> >> > > >
> >> > > >
> >> > > > <useColdSearcher>false</useColdSearcher>
> >> > > >
> >> > > > <maxWarmingSearchers>2</maxWarmingSearchers>
> >> > > >
> >> > > > <listener event="newSearcher" class="solr.QuerySenderListener">
> >> > > >   <arr name="queries">
> >> > > >   </arr>
> >> > > > </listener>
> >> > > > <listener event="firstSearcher" class="solr.QuerySenderListener">
> >> > > >   <arr name="queries">
> >> > > >   </arr>
> >> > > > </listener>
> >> > >
> >> >
> >>
>

Reply via email to