[ https://issues.apache.org/jira/browse/OAK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrei Dulceanu resolved OAK-6659. ---------------------------------- Resolution: Fixed Fixed at r1808355. bq. I would also definitely appreciate the removal of the logOnly property. It was always ugly to begin with. Created OAK-6667 to track this. Thanks for reviewing, [~frm]! > Cold standby should fail loudly when a big blob can't be timely transferred > --------------------------------------------------------------------------- > > Key: OAK-6659 > URL: https://issues.apache.org/jira/browse/OAK-6659 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby > Affects Versions: 1.7.6 > Reporter: Andrei Dulceanu > Assignee: Andrei Dulceanu > Priority: Critical > Labels: cold-standby > Fix For: 1.7.8 > > Attachments: OAK-6659.patch > > > Due to changes done in OAK-4969, currently there are two 'sync blob' cycles > triggered by {{StandbyDiff#childNodeChanged}}. The test scenario is the same > as the one in {{DataStoreTestBase#testSyncBigBlob}}: on the primary file > store, a new big blob (1GB) is added and then a standby sync is triggered to > sync this content to the secondary file store. > The first 'sync blob' cycle happens as a result of {{#process}} being called > in {{StandbyDiff#childNodeChanged}}. Therefore, a new 'get blob' request is > created on the client and the server starts sending chunks from the big blob. > Now, if the time needed for transferring the entire blob from server to > client exceeds {{readTimeoutMs}} an {{IllegalStateException}} will be > correctly thrown by {{StandbyDiff#readBlob}}, but will be swallowed by the > {{StandbyDiff#childNodeChanged}} in its catch clause. A second 'sync blob' > cycle will be triggered and, -this might succeed with the same > {{readTimeoutMs}} for which it was failing before-, if {{readTimeoutMs * 2}} > is enough, the blob will be synced on the standby. This happens because the > server will continue sending the remaining chunks after > {{IllegalStateException}} was thrown (first 'sync blob' cycle). > The consequence of these two 'sync blob' cycles is that sometimes, deleting > the temporary file to which chunks are spooled to on the client fails (see > Windows for example and OAK-6641 specifically). This way, instead of deleting > the previous incomplete transfer, new chunks from the second 'sync blob' > cycle are added. The blob persisted in the blob store on the client won't > have the same size and id as the initial blob sent by the server. -- This message was sent by Atlassian JIRA (v6.4.14#64029)