Re: Solr4x: Separate Indexer and Query Instances for Performance

Otis Gospodnetic Tue, 05 Mar 2013 19:44:41 -0800

Mark - just added https://issues.apache.org/jira/browse/SOLR-4532 for
whipping post-analyzed fields.


For Mike, sounds like he should just stick to master-slave with 4.x for
now....although I see what he is saying - what Mike is after could be
thought of as "SolrCloud with very non-RT replication -- push replication
like in SolrCloud, but done periodically and not in RT"

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Tue, Mar 5, 2013 at 6:58 PM, Mark Miller <markrmil...@gmail.com> wrote:

>
> On Mar 5, 2013, at 3:44 PM, Mike Schultz <mike.schu...@gmail.com> wrote:
>
> > Solr 3x had a master/slave architecture which meant that indexing did not
> > happen in the same process as querying, in fact normally not even on the
> > same machine.  The querier only needed to copy down snapshots of the new
> > index files and commit them.  Great isolation for maximum query
> performance
> > and indexing performance.  Now in Solr4x this is gone.  Does anyone have
> any
> > answer or tuning approaches to address this?
>
> No it's not, you still have the old model if you want.
>
> >
> > We have a high query load, high indexing load environment.  I see TP99
> query
> > latency go from under 100mS to 4-10 seconds during indexing.  Even TP90
> hits
> > 2 seconds.  Looking at GC in visualVM, I see the a pretty sawtooth turn
> into
> > a scraggily forest when indexing happens and the eden space gets burned
> > through.
> >
> > It seems like one approach is to have the shard leaders replicate (a la
> 3x)
> > to their replicas instead of sending them the document stream.  I know
> the
> > replicas do that when they get "too far behind", so this would simply
> mean,
> > always doing that at some given interval.  This would make it possible to
> > only put replicas into a query load balancer.  In the event of a leader
> > failure, a replica would be promoted and you'd have to deal with it, but
> > it'd be no worse than what is now steady-state in standard 4x.
>
> You can't really do this without losing important SolrCloud features like
> durability and such. In SolrCloud, replication only happens when a node is
> in recovery mode - during this time it's buffering updates and not involved
> in searches.
>
> >
> > Another approach might be to have separate Solr instances point to the
> same
> > index directory.  One instance is used for indexing and tuned for that,
> that
> > other tuned for querying.  It's not like having the operations on
> separate
> > machines as 3x but it still would be better isolation than standard 4x.
> > Would this at least work in theory, if say the query instance started up
> a
> > new IndexSearcher when necessary?
> >
> > Any insight, advice or experience on this appreciated.
> >
> > Mike
>
> It's basically a trade off at the moment - use master slave with 4x and
> get this isolation or use SolrCloud and get it's alternate benefits.
>
> One possible future optimization with SolrCloud may be to send
> pre-analzyed docs to the replicas. Just a possibility though.
>
>
> You look at tuning GC and or other settings to make things better. I think
> I've certainly seen this hold up better than you describe in the past. I'm
> sure this depends on a lot of factors though (data size, hardware, ram etc)
>
> Mark
>
>

Re: Solr4x: Separate Indexer and Query Instances for Performance

Reply via email to