Thanks Erick and Emir -- we are going to start with <1> and possibly <2>.

On Thu, Oct 26, 2017 at 7:06 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Fengtan,
> I would just add that when merging collections, you might want to use
> document routing (https://lucene.apache.org/solr/guide/6_6/shards-and-
> indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrClo
> ud-DocumentRouting <https://lucene.apache.org/solr/guide/6_6/shards-and-
> indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrClo
> ud-DocumentRouting>) - since you are keeping separate collections, I
> guess you have a “collection ID” to use as routing key. This will enable
> you to have one collection but query only shard(s) with data from one
> “collection”.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 25 Oct 2017, at 19:25, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > <1> It's not the explicit commits are expensive, it's that they happen
> > too fast. An explicit commit and an internal autocommit have exactly
> > the same cost. Your "overlapping ondeck searchers"  is definitely an
> > indication that your commits are happening from somwhere too quickly
> > and are piling up.
> >
> > <2> Likely a good thing, each collection increases overhead. And
> > 1,000,000 documents is quite small in Solr's terms unless the
> > individual documents are enormous. I'd do this for a number of
> > reasons.
> >
> > <3> Certainly an option, but I'd put that last. Fix the commit problem
> first ;)
> >
> > <4> If you do this, make the autowarm count quite small. That said,
> > this will be very little use if you have frequent commits. Let's say
> > you commit every second. The autowarming will warm caches, which will
> > then be thrown out a second later. And will increase the time it takes
> > to open a new searcher.
> >
> > <5> Yeah, this would probably just be a band-aid.
> >
> > If I were prioritizing these, I'd do
> > <1> first. If you control the client, just don't call commit. If you
> > do not control the client, then what you've outlined is fine. Tip: set
> > your soft commit settings to be as long as you can stand. If you must
> > have very short intervals, consider disabling your caches completely.
> > Here's a long article on commits....
> > https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > <2> Actually, this and <1> are pretty close in priority.
> >
> > Then re-evaluate. Fixing the commit issue may buy you quite a bit of
> > time. Having 1,000 collections is pushing the boundaries presently.
> > Each collection will establish watchers on the bits it cares about in
> > ZooKeeper, and reducing the watchers by a factor approaching 1,000 is
> > A Good Thing.
> >
> > Frankly, between these two things I'd pretty much expect your problems
> > to disappear. wouldn't be the first time I've been totally wrong, but
> > it's where I'd start ;)
> >
> > Best,
> > Erick
> >
> > On Wed, Oct 25, 2017 at 8:54 AM, Fengtan <fengtan...@gmail.com> wrote:
> >> Hi,
> >>
> >> We run a SolrCloud 6.4.2 cluster with ZooKeeper 3.4.6 on 3 VM's.
> >> Each VM runs RHEL 7 with 16 GB RAM and 8 CPU and OpenJDK 1.8.0_131 ;
> each
> >> VM has one Solr and one ZK instance.
> >> The cluster hosts 1,000 collections ; each collection has 1 shard and
> >> between 500 and 50,000 documents.
> >> Documents are indexed incrementally every day ; the Solr client mostly
> does
> >> searching.
> >> Solr runs with -Xms7g -Xmx7g.
> >>
> >> Everything has been working fine for about one month but a few days ago
> we
> >> started to see Solr timeouts: https://pastebin.com/raw/E2prSrQm
> >>
> >> Also we have always seen these:
> >>  PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> >>
> >>
> >> We are not sure what is causing the timeouts, although we have
> identified a
> >> few things that could be improved:
> >>
> >> 1) Ignore explicit commits using IgnoreCommitOptimizeUpdateProc
> essorFactory
> >> -- we are aware that explicit commits are expensive
> >>
> >> 2) Drop the 1,000 collections and use a single one instead (all our
> >> collections use the same schema/solrconfig.xml) since stability problems
> >> are expected when the number of collections reaches the low hundreds
> >> <https://wiki.apache.org/solr/SolrPerformanceProblems#SolrCloud>. The
> >> downside is that the new collection would contain 1,000,000 documents
> which
> >> may bring new challenges.
> >>
> >> 3) Tune the GC and possibly switch from CMS to G1 as it seems to bring a
> >> better performance according to this
> >> <https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
> >,
> >> this
> >> <https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> First.29_Collector>
> >> and this
> >> <http://lucene.472066.n3.nabble.com/java-util-
> concurrent-TimeoutException-Idle-timeout-expired-50001-
> 50000-ms-td4321209.html>.
> >> The downside is that Lucene explicitely discourages the usage of G1
> >> <https://wiki.apache.org/lucene-java/JavaBugs#Java_
> Bugs_in_various_JVMs_affecting_Lucene_.2F_Solr>
> >> so we are not sure what to expect. We use the default GC settings:
> >>  -XX:NewRatio=3
> >>  -XX:SurvivorRatio=4
> >>  -XX:TargetSurvivorRatio=90
> >>  -XX:MaxTenuringThreshold=8
> >>  -XX:+UseConcMarkSweepGC
> >>  -XX:+UseParNewGC
> >>  -XX:ConcGCThreads=4
> >>  -XX:ParallelGCThreads=4
> >>  -XX:+CMSScavengeBeforeRemark
> >>  -XX:PretenureSizeThreshold=64m
> >>  -XX:+UseCMSInitiatingOccupancyOnly
> >>  -XX:CMSInitiatingOccupancyFraction=50
> >>  -XX:CMSMaxAbortablePrecleanTime=6000
> >>  -XX:+CMSParallelRemarkEnabled
> >>  -XX:+ParallelRefProcEnabled
> >>
> >> 4) Tune the caches, possibly by increasing autowarmCount on filterCache
> --
> >> our current config is:
> >>  <filterCache class="solr.FastLRUCache" size="512" initialSize="512"
> >> autowarmCount="0"/>
> >>  <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
> >> autowarmCount="32"/>
> >>  <documentCache class="solr.LRUCache" size="512" initialSize="512"
> >> autowarmCount="0"/>
> >>
> >> 5) Tweak the timeout settings, although this would not fix the
> underlying
> >> issue
> >>
> >>
> >> Does any of these options seem relevant ? Is there anything else that
> might
> >> address the timeouts ?
> >>
> >> Thanks
>
>

Reply via email to