[jira] [Resolved] (OAK-6659) Cold standby should fail loudly when a big blob can't be timely transferred

Andrei Dulceanu (JIRA) Thu, 14 Sep 2017 08:16:20 -0700

     [ 
https://issues.apache.org/jira/browse/OAK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrei Dulceanu resolved OAK-6659.
----------------------------------
    Resolution: Fixed

Fixed at r1808355.

bq. I would also definitely appreciate the removal of the logOnly property. It 
was always ugly to begin with.
Created OAK-6667 to track this. Thanks for reviewing, [~frm]!

> Cold standby should fail loudly when a big blob can't be timely transferred
> ---------------------------------------------------------------------------
>
>                 Key: OAK-6659
>                 URL: https://issues.apache.org/jira/browse/OAK-6659
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>    Affects Versions: 1.7.6
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Critical
>              Labels: cold-standby
>             Fix For: 1.7.8
>
>         Attachments: OAK-6659.patch
>
>
> Due to changes done in OAK-4969, currently there are two 'sync blob' cycles 
> triggered by {{StandbyDiff#childNodeChanged}}. The test scenario is the same 
> as the one in {{DataStoreTestBase#testSyncBigBlob}}: on the primary file 
> store, a new big blob (1GB) is added and then a standby sync is triggered to 
> sync this content to the secondary file store. 
> The first 'sync blob' cycle happens as a result of {{#process}} being called 
> in {{StandbyDiff#childNodeChanged}}. Therefore, a new 'get blob' request is 
> created on the client and the server starts sending chunks from the big blob. 
> Now, if the time needed for transferring the entire blob from server to 
> client exceeds {{readTimeoutMs}} an {{IllegalStateException}} will be 
> correctly thrown by {{StandbyDiff#readBlob}}, but will be swallowed by the 
> {{StandbyDiff#childNodeChanged}} in its catch clause. A second 'sync blob' 
> cycle will be triggered and, -this might succeed with the same 
> {{readTimeoutMs}} for which it was failing before-, if {{readTimeoutMs * 2}} 
> is enough, the blob will be synced on the standby. This happens because the 
> server will continue sending the remaining chunks after 
> {{IllegalStateException}} was thrown (first 'sync blob' cycle).
> The consequence of these two 'sync blob' cycles is that sometimes, deleting 
> the temporary file to which chunks are spooled to on the client fails (see 
> Windows for example and OAK-6641 specifically). This way, instead of deleting 
> the previous incomplete transfer, new chunks from the second 'sync blob' 
> cycle are added. The blob persisted in the blob store on the client won't 
> have the same size and id as the initial blob sent by the server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (OAK-6659) Cold standby should fail loudly when a big blob can't be timely transferred

Reply via email to