[ 
https://issues.apache.org/jira/browse/OAK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari resolved OAK-6749.
---------------------------------
    Resolution: Fixed

> Segment-Tar standby sync fails with "in-memory" blobs present in the source 
> repo
> --------------------------------------------------------------------------------
>
>                 Key: OAK-6749
>                 URL: https://issues.apache.org/jira/browse/OAK-6749
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob, tarmk-standby
>    Affects Versions: 1.6.2
>            Reporter: Csaba Varga
>            Assignee: Francesco Mari
>            Priority: Major
>             Fix For: 1.10.1, 1.8.12, 1.10
>
>         Attachments: OAK-6749-01.patch, OAK-6749-02.patch
>
>
> We have run into some issue when trying to transition from an active/active 
> Mongo NodeStore cluster to a single Segment-Tar server with cold standby. The 
> issue itself manifests when the standby server tries to pull changes from the 
> primary after the first round of online revision GC.
> Let me summarize the way we ended up with the current state, and my 
> hypothesis about what happened, based on my debugging so far:
> # We started with a Mongo NodeStore and an external FileDataStore as the blob 
> store. The FileDataStore was set up with minRecordLength=4096. The Mongo 
> store stores blobs below minRecordLength as special "in-memory" blobIDs where 
> the data itself is baked into the ID string in hex.
> # We have executed a sidegrade of the Mongo store into a Segment-Tar store. 
> Our datastore is over 1TB in size, so copying the binaries wasn't an option. 
> The new repository is simply reusing the existing datastore. The "in-memory" 
> blobIDs still look like external blobIDs to the sidegrade process, so they 
> were copied into the Segment-Tar repository as-is, instead of being converted 
> into the efficient in-line format.
> # The server started up without issues on the new Segment-Tar store. The 
> migrated "in-memory" blob IDs seem to work fine, if a bit sub-optimal.
> # At this point, we have created a cold standby instance by copying the files 
> of the stopped primary instance and making the necessary config changes on 
> both servers.
> # Everything worked fine until the primary server started its first round of 
> online revision GC. After that process completed, the standby node started 
> throwing exceptions about missing segments, and eventually stopped 
> altogether. In the meantime, the following warning showed up in the primary 
> log:
> {code:java}
> 29.09.2017 06:12:08.088 *WARN* [nioEventLoopGroup-3-10] 
> org.apache.jackrabbit.oak.segment.standby.server.ExceptionHandler Exception 
> caught on the server
> io.netty.handler.codec.TooLongFrameException: frame length (8208) exceeds the 
> allowed maximum (8192)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:146)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:142)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:99)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:75)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:611)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:552)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:466)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
>         at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is what seems to be happening:
> # The revision GC creates brand new segments, and the standby instance starts 
> pulling them into its own store.
> # When the standby sees an "in-memory" blobID, it decides that it doesn't 
> have this blob in its own blobstore, so it proceeds to ask for the bytes of 
> the blob from the primary, even though they are encoded in the ID itself.
> # The longest blobID can be more than 8K in size (the 4K blob gets doubled by 
> hex encoding). When such a long blobID is submitted to the primary, the 
> request gets rejected because of excessive length. The secondary keeps 
> waiting until the request times out, and no progress is made in syncing.
> The issue doesn't pop up with repositories that started as Segment-Tar since 
> Segment-Tar always inlines blobs below some hardcoded threshold (16K if I 
> remember correctly).
> I think there could be multiple ways to approach this, not mutually exclusive:
> * Special-case the "in-memory" BlobIDs during sidegrade and replace them with 
> the "native" segment values. If hardcoding knowledge about this 
> implementation detail isn't desired, there could be a new option for the 
> sidegrade process, to force "inlining" of blobs below a certain threshold, 
> even if they aren't in-line in the source repo.
> * Special-case the "in-memory" BlobIDs in StandbyDiff so they aren't 
> requested from the primary, but are either kept as-is or get converted to the 
> "native" format.
> * Increase the network package size limit in the sync protocol, or allow it 
> to be configured. This is the least efficient option, but with the least 
> impact on the code.
> I can work on detailed reproduction steps if needed, but I'd rather not do it 
> beforehand because this is rather cumbersome to reproduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to