[ 
https://issues.apache.org/jira/browse/GEODE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McMahon reassigned GEODE-6607:
-----------------------------------

    Assignee: Ryan McMahon

> Possible client subscription data inconsistency due to race between 
> retrieving filter info and distributing event
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6607
>                 URL: https://issues.apache.org/jira/browse/GEODE-6607
>             Project: Geode
>          Issue Type: Bug
>          Components: client queues
>            Reporter: Ryan McMahon
>            Assignee: Ryan McMahon
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It is possible for a client to miss events from subscription (either CQ or 
> register interest) due to the following scenario:
> Four servers in a cluster, with redundant copies set to 2 for client 
> subscriptions.  The client has its primary subscription endpoint with server 
> 1 and redundant copies are on servers 2 and 3.  Server 2 is killed or lost 
> due to network partition, so we attempt to restore redundancy by copying the 
> client queue from server 3 to server 4.  
> Two things happen when server 4 gets the client queue from server 3.  First, 
> we request the client's filter info which represents the CQ and register 
> interest info.  Second, we actually perform the GII to get the image of the 
> queue.  
> A race can occur where an event is being distributed across the cluster 
> concurrently while server 4 is initializing the client queue.  If the 
> distributed event is processed by server 4 before the filter info is 
> retrieved, then the event will not match the client subscription filter 
> because it doesn't exist yet.  Then, if the event is processed by server 3 
> after GII has started, the event will not be part of the client queue image.  
> Therefore, the event is never added to the client queue and is lost.
> We have a special queue for handling events while a client is initializing, 
> but it is at too low of a level (MessageDispatcher) to be able to handle this 
> scenario.  One possible solution is moving this special queue to a higher 
> level (CacheClientNotifier or CacheClientProxy) so the event is queued before 
> we even attempt to get filter info.  Then, when initialization finishes, we 
> drain the queue, see if it matches the initialized client's filter, and send 
> it along if so.  A similar solution could be done on the GII provider side 
> but it might be a bit messier.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to