Mark - just added https://issues.apache.org/jira/browse/SOLR-4532 for whipping post-analyzed fields.
For Mike, sounds like he should just stick to master-slave with 4.x for now....although I see what he is saying - what Mike is after could be thought of as "SolrCloud with very non-RT replication -- push replication like in SolrCloud, but done periodically and not in RT" Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Tue, Mar 5, 2013 at 6:58 PM, Mark Miller <markrmil...@gmail.com> wrote: > > On Mar 5, 2013, at 3:44 PM, Mike Schultz <mike.schu...@gmail.com> wrote: > > > Solr 3x had a master/slave architecture which meant that indexing did not > > happen in the same process as querying, in fact normally not even on the > > same machine. The querier only needed to copy down snapshots of the new > > index files and commit them. Great isolation for maximum query > performance > > and indexing performance. Now in Solr4x this is gone. Does anyone have > any > > answer or tuning approaches to address this? > > No it's not, you still have the old model if you want. > > > > > We have a high query load, high indexing load environment. I see TP99 > query > > latency go from under 100mS to 4-10 seconds during indexing. Even TP90 > hits > > 2 seconds. Looking at GC in visualVM, I see the a pretty sawtooth turn > into > > a scraggily forest when indexing happens and the eden space gets burned > > through. > > > > It seems like one approach is to have the shard leaders replicate (a la > 3x) > > to their replicas instead of sending them the document stream. I know > the > > replicas do that when they get "too far behind", so this would simply > mean, > > always doing that at some given interval. This would make it possible to > > only put replicas into a query load balancer. In the event of a leader > > failure, a replica would be promoted and you'd have to deal with it, but > > it'd be no worse than what is now steady-state in standard 4x. > > You can't really do this without losing important SolrCloud features like > durability and such. In SolrCloud, replication only happens when a node is > in recovery mode - during this time it's buffering updates and not involved > in searches. > > > > > Another approach might be to have separate Solr instances point to the > same > > index directory. One instance is used for indexing and tuned for that, > that > > other tuned for querying. It's not like having the operations on > separate > > machines as 3x but it still would be better isolation than standard 4x. > > Would this at least work in theory, if say the query instance started up > a > > new IndexSearcher when necessary? > > > > Any insight, advice or experience on this appreciated. > > > > Mike > > It's basically a trade off at the moment - use master slave with 4x and > get this isolation or use SolrCloud and get it's alternate benefits. > > One possible future optimization with SolrCloud may be to send > pre-analzyed docs to the replicas. Just a possibility though. > > > You look at tuning GC and or other settings to make things better. I think > I've certainly seen this hold up better than you describe in the past. I'm > sure this depends on a lot of factors though (data size, hardware, ram etc) > > Mark > >