[ 
https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider resolved GEODE-8338.
-------------------------------------
    Fix Version/s: 1.14.0
       Resolution: Fixed

> Redis commands may be repeated when server dies
> -----------------------------------------------
>
>                 Key: GEODE-8338
>                 URL: https://issues.apache.org/jira/browse/GEODE-8338
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>            Reporter: Sarah Abbey
>            Assignee: Darrel Schneider
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Since we have one redundant copy of the data, and since we modify the data 
> using a function, I think we may have a data corruption issue with 
> non-idempotent operations. What can happen is that an operation like APPEND 
> can:
>  0) executor called on non-primary redis server, 
>  1) modify the primary (by sending a function exec to it), 
>  2) modify the secondary (by sending a geode delta to it), 
>  3) the primary server fails now (before the function executing on it 
> completes), 
>  4) the non-primary redis server sees the function fail and that it is marked 
> as HA so it retries it. This time it sends it the secondary, which is the new 
> primary, but the operation was actually done on the secondary so this retry 
> will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even 
> they could cause extra key events in the future), but for ops like APPEND we 
> end up appending twice.
> This will only happen when a server executing a function dies and our 
> function service retries the function on another server because it is marked 
> HA. The easy way to fix this is to change our function to not be HA. This is 
> just a single one line change.
>  Note that our clients can already see exceptions/errors if the server they 
> are connected to dies. When that happens the operation they requested may 
> have happened, and if they have multiple geode redis servers running it may 
> have been stored and still in memory. So clients will need some logic to 
> decide if they should redo such an operation or not (because it is already 
> done).
> *Note:* By making the function non-HA, it should just give the client another 
> case in which they need to handle a server crash. It can now be for servers 
> they were not connected to but that were involved in performing the operation 
> they requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to