This is a different issue. You are seeing the latency between master index 
update and replication to slave(s).
Solve this by pointing your monitoring script directly to slave instead of 
master.

What this thread is about is a potential difference in state during the 
execution of a single sharded query, not due to master/slave but due to the 
index being updated between phase 1 and phase 2.

I'm pretty sure the 2nd phase to fetch doc-summaries goes directly to same 
server as first phase. But what if you stick a LB in between? Then perhaps the 
first phase may go to master and second to slave?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 12. okt. 2010, at 17.12, Shawn Heisey wrote:

> On 10/11/2010 6:32 PM, Peter Keegan wrote:
>> When Solr does a distributed search across shards, it does this in 2 phases
>> (correct me if I'm wrong):
>> 
>> 1. 1st query to get the docIds and facet counts
>> 2. 2nd query to retrieve the stored fields of the top hits
>> 
>> The problem here is that the index could change between (1) and (2), so it's
>> not an atomic transaction. If the stored fields were kept outside of Lucene,
>> only the first query would be necessary. However, this would mean that the
>> external NoSQL data store would have to be synchronized with the Lucene
>> index, which might present its own problems. (I'm just throwing this out for
>> discussion)
> 
> I've got a related issue that I have run into because of my use of a load 
> balancer.
> 
> I have a total of seven shards, each of which has a replica.  I've got one 
> set of machines set up as brokers that have the shards parameter in the 
> standard request handler.  Queries are sent to the load balancer, which sends 
> it to one of the brokers.  The shards parameter sends requests back to the 
> load balancer to be ultimately sent to an actual server.
> 
> I have a monitoring script that retrieves the latest document and alarms if 
> it's older than ten minutes.  Something that happens on occasion:
> 
> 1) An update is made to the master (happens every two minutes).
> 2) Monitoring script requests newest document.
> 3) Initial request is sent to master, finds ID.
> 4) Second request is sent to the slave, document not found.
> 5) Up to 15 seconds later, the slave replicates.
> 
> I solved this problem by having the monitoring script try several times on 
> failure, waiting a few seconds on each loop.  Do I need to be terribly 
> concerned about this impacting real queries?
> 
> I do not actually need to load balance, I have slave servers purely for 
> failover.  Currently the load balancer has a 3 to 1 weight ratio favoring the 
> slaves, which I plan to increase.  At one time I had the master set up as a 
> backup rather than a lower weight target, but haproxy seemed to take longer 
> to recover from failures in that mode.  I will have to do some more 
> comprehensive testing.  If there's a better solution than haproxy that works 
> with heartbeat, I can change that.
> 
> Thanks,
> Shawn
> 

Reply via email to