Thanks Erik, Yes i see the 'sawtooth' pattern. I will try your suggestion, but i am wondering why were the queries performant with solr4 without DocValues? Have some defaults changed?
--- On Sat, Nov 11, 2017 at 8:28 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Nawab: > > bq: Cache hit ratios are all in 80+% (even when i decreased the > filterCache to 128) > > This suggests that you use a relatively small handful of fq clauses, > which is perfectly fine. Having 450M docs and a cache size of 1024 is > _really_ scary! You had a potential for a 57G (yes, gigabyte) > filterCache. Fortunately you apparently don't use enough different fq > clauses to fill it up, or they match very few documents. I cheated a > little, if the result set is small the individual doc IDs are stored > rather than a bitset 450M bits wide.... Your > admin>>core>>plugins/stats>>filterCache should show you how many > evictions there are which is another interesting stat. > > As it is, you're filterCache might use up 7G or so. Hefty but you have > lots of RAM. > > ************* > bq: Document cache hitratio is really bad, > > This is often the case. Getting documents really means, here, getting > the _stored_ values. The point of the documentCache is to keep entries > in a cache for the various elements of a single request to use. To > name just 2 > > you get the stored values for the "fl" list > > you highlight. > > These are separate, and each accesses the stored values. Problem is, > "accessing the stored values" means > 1> reading the document from disk > 2> decompressing a 16K block minimum. > > I'm skipping the fact that returning docValues doesn't need the stored > data, but you get the idea. > > Anyway, not having to read/decompress for both the"fl" list and > highlighting is what the documentCache is about. That's where the > recommendation "size it as (max # of users) * (max rows)" > recommendation comes in (if you can afford the memory certainly). > > Some users have situations where the documentCache hit ratio is much > better, but I'd be surprised if any core with 450M docs even got > close. > > ************* > bq: That supported the hypothesis that the query throughput decreases > after opening a new searcher and **not** after committing the index > > Are you saying that you have something of a sawtooth pattern? I.e. > queries are slow "for a while" after opening a new searcher but then > improve until the next commit? This is usually an autowarm problem, so > you might address it with a more precise autowarm. Look particularly > for anything that sorts/groups/facets. Any such fields should have > docValues=true set. Unfortunately this will require a complete > re-index. Don't be frightened by the fact that enabling docValues will > cause your index size on disk to grow. Paradoxically that will > actually _lower_ the size of the JVM heap requirements. Essentially > the additional size on disk is the serialized structure that would > have to be built in the JVM. Since it is pre-built at index time, it > can be MMapped and use OS memory space and not JVM. > > ************* > 450M docs and 800G index size is quite large and a prime candidate for > sharding FWIW. > > Best, > Erick > > > > > On Sat, Nov 11, 2017 at 4:52 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> > wrote: > > ~248 gb > > > > Nawab > > > > > > On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <kris...@apache.org> wrote: > > > >> > One machine runs with a 3TB drive, running 3 solr processes (each with > >> one core as described above). > >> > >> How much total memory on the machine? > >> > >> Kevin Risden > >> > >> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal < > khi...@gmail.com> > >> wrote: > >> > >> > Thanks for a quick and detailed response, Erick! > >> > > >> > Unfortunately i don't have a proof, but our servers with solr 4.5 are > >> > running really nicely with the above config. I had assumed that same > or > >> > similar settings will also perform well with Solr 7, but that > assumption > >> > didn't hold. As, a lot has changed in 3 major releases. > >> > I have tweaked the cache values as you suggested but increasing or > >> > decreasing doesn't seem to do any noticeable improvement. > >> > > >> > At the moment, my one core has 800GB index, ~450 Million documents, > 48 G > >> > Xmx. GC pauses haven't been an issue though. One machine runs with a > 3TB > >> > drive, running 3 solr processes (each with one core as described > >> above). I > >> > agree that it is a very atypical system so i should probably try > >> different > >> > parameters with a fresh eye to find the solution. > >> > > >> > > >> > I tried with autocommits (commit with opensearcher=false very half > >> minute ; > >> > and softcommit every 5 minutes). That supported the hypothesis that > the > >> > query throughput decreases after opening a new searcher and **not** > after > >> > committing the index . Cache hit ratios are all in 80+% (even when i > >> > decreased the filterCache to 128, so i will keep it at this lower > value). > >> > Document cache hitratio is really bad, it drops to around 40% after > >> > newSearcher. But i guess that is expected, since it cannot be warmed > up > >> > anyway. > >> > > >> > > >> > Thanks > >> > Nawab > >> > > >> > > >> > > >> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson < > erickerick...@gmail.com> > >> > wrote: > >> > > >> > > What evidence to you have that the changes you've made to your > configs > >> > > are useful? There's lots of things in here that are suspect: > >> > > > >> > > <double name="forceMergeDeletesPctAllowed">1</double> > >> > > > >> > > First, this is useless unless you are forceMerging/optimizing. Which > >> > > you shouldn't be doing under most circumstances. And you're going to > >> > > be rewriting a lot of data every time See: > >> > > > >> > > https://lucidworks.com/2017/10/13/segment-merging-deleted- > >> > > documents-optimize-may-bad/ > >> > > > >> > > filterCache size of size="10240" is far in excess of what we usually > >> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of > them. > >> > > Why did you choose this? On the theory that "more is better?" If > >> > > you're using NOW then you may not be using the filterCache well, > see: > >> > > > >> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ > >> > > > >> > > autowarmCount="1024" > >> > > > >> > > Every time you commit you're firing off 1024 queries which is going > to > >> > > spike the CPU a lot. Again, this is super-excessive. I usually start > >> > > with 16 or so. > >> > > > >> > > Why are you committing from a cron job? Why not just set your > >> > > autocommit settings and forget about it? That's what they're for. > >> > > > >> > > Your queryResultCache is likewise kind of large, but it takes up > much > >> > > less space than the filterCache per entry so it's probably OK. I'd > >> > > still shrink it and set the autowarm to 16 or so to start, unless > >> > > you're seeing a pretty high hit ratio, which is pretty unusual but > >> > > does happen. > >> > > > >> > > 48G of memory is just asking for long GC pauses. How many docs do > you > >> > > have in each core anyway? If you're really using this much heap, > then > >> > > it'd be good to see what you can do to shrink in. Enabling docValues > >> > > for all fields you facet, sort or group on will help that a lot if > you > >> > > haven't already. > >> > > > >> > > How much memory on your entire machine? And how much is used by > _all_ > >> > > the JVMs you running on a particular machine? MMapDirectory needs as > >> > > much OS memory space as it can get, see: > >> > > > >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > >> > > > >> > > Lately we've seen some structures that consume memory until a commit > >> > > happens (either soft or hard). I'd shrink my autocommit down to 60 > >> > > seconds or even less (openSearcher=false). > >> > > > >> > > In short, I'd go back mostly to the default settings and build _up_ > as > >> > > you can demonstrate improvements. You've changed enough things here > >> > > that untangling which one is the culprit will be hard. You want the > >> > > JVM to have as little memory as possible, unfortunately that's > >> > > something you figure out by experimentation. > >> > > > >> > > Best, > >> > > Erick > >> > > > >> > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal < > >> khi...@gmail.com> > >> > > wrote: > >> > > > Hi, > >> > > > > >> > > > I am committing every 5 minutes using a periodic cron job "curl > >> > > > http://localhost:8984/solr/core1/update?commit=true". Besides > this, > >> my > >> > > app > >> > > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am > >> noticing > >> > > > that query throughput plummets every 5 minutes - probably when the > >> > commit > >> > > > happens. > >> > > > What can I do to improve this? I didn't use to happen like this in > >> > > solr4.5. > >> > > > (i.e., i used to get a stable query throughput of 50-60 queries > per > >> > > > second. Now there are spikes to 60 qps interleaved by drops to > almost > >> > > > **0**). Between those 5 minutes, I am able to achieve high > >> throughput, > >> > > > hence I guess that issue is related to indexing or merging, and > not > >> > query > >> > > > flow. > >> > > > > >> > > > I have 48G allotted to each solr process, and it seems that only > ~50% > >> > is > >> > > > being used at any time, similarly CPU is not spiking beyond 50% > >> either. > >> > > > There is frequent merging (every 5 minute) , but i am not sure if > >> that > >> > is > >> > > > a cause of the slowdown. > >> > > > > >> > > > Here are my merge and cache settings: > >> > > > > >> > > > Thanks > >> > > > Nawab > >> > > > > >> > > > <mergePolicyFactory class="org.apache.solr.index. > >> > > TieredMergePolicyFactory"> > >> > > > <int name="maxMergeAtOnce">5</int> > >> > > > <int name="segmentsPerTier">5</int> > >> > > > <int name="maxMergeAtOnceExplicit">10</int> > >> > > > <int name="floorSegmentMB">16</int> > >> > > > <!-- 50 gb --> > >> > > > <double name="maxMergedSegmentMB">50000</double> > >> > > > <double name="forceMergeDeletesPctAllowed">1</double> > >> > > > > >> > > > </mergePolicyFactory> > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > <filterCache class="solr.FastLRUCache" > >> > > > size="10240" > >> > > > initialSize="5120" > >> > > > autowarmCount="1024"/> > >> > > > <queryResultCache class="solr.LRUCache" > >> > > > size="10240" > >> > > > initialSize="5120" > >> > > > autowarmCount="0"/> > >> > > > <documentCache class="solr.LRUCache" > >> > > > size="10240" > >> > > > initialSize="5120" > >> > > > autowarmCount="0"/> > >> > > > > >> > > > > >> > > > <useColdSearcher>false</useColdSearcher> > >> > > > > >> > > > <maxWarmingSearchers>2</maxWarmingSearchers> > >> > > > > >> > > > <listener event="newSearcher" class="solr.QuerySenderListener"> > >> > > > <arr name="queries"> > >> > > > </arr> > >> > > > </listener> > >> > > > <listener event="firstSearcher" class="solr.QuerySenderListener"> > >> > > > <arr name="queries"> > >> > > > </arr> > >> > > > </listener> > >> > > > >> > > >> >