[ https://issues.apache.org/jira/browse/OAK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francesco Mari resolved OAK-6749. --------------------------------- Resolution: Fixed > Segment-Tar standby sync fails with "in-memory" blobs present in the source > repo > -------------------------------------------------------------------------------- > > Key: OAK-6749 > URL: https://issues.apache.org/jira/browse/OAK-6749 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob, tarmk-standby > Affects Versions: 1.6.2 > Reporter: Csaba Varga > Assignee: Francesco Mari > Priority: Major > Fix For: 1.10.1, 1.8.12, 1.10 > > Attachments: OAK-6749-01.patch, OAK-6749-02.patch > > > We have run into some issue when trying to transition from an active/active > Mongo NodeStore cluster to a single Segment-Tar server with cold standby. The > issue itself manifests when the standby server tries to pull changes from the > primary after the first round of online revision GC. > Let me summarize the way we ended up with the current state, and my > hypothesis about what happened, based on my debugging so far: > # We started with a Mongo NodeStore and an external FileDataStore as the blob > store. The FileDataStore was set up with minRecordLength=4096. The Mongo > store stores blobs below minRecordLength as special "in-memory" blobIDs where > the data itself is baked into the ID string in hex. > # We have executed a sidegrade of the Mongo store into a Segment-Tar store. > Our datastore is over 1TB in size, so copying the binaries wasn't an option. > The new repository is simply reusing the existing datastore. The "in-memory" > blobIDs still look like external blobIDs to the sidegrade process, so they > were copied into the Segment-Tar repository as-is, instead of being converted > into the efficient in-line format. > # The server started up without issues on the new Segment-Tar store. The > migrated "in-memory" blob IDs seem to work fine, if a bit sub-optimal. > # At this point, we have created a cold standby instance by copying the files > of the stopped primary instance and making the necessary config changes on > both servers. > # Everything worked fine until the primary server started its first round of > online revision GC. After that process completed, the standby node started > throwing exceptions about missing segments, and eventually stopped > altogether. In the meantime, the following warning showed up in the primary > log: > {code:java} > 29.09.2017 06:12:08.088 *WARN* [nioEventLoopGroup-3-10] > org.apache.jackrabbit.oak.segment.standby.server.ExceptionHandler Exception > caught on the server > io.netty.handler.codec.TooLongFrameException: frame length (8208) exceeds the > allowed maximum (8192) > at > io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:146) > at > io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:142) > at > io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:99) > at > io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:75) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:611) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:552) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:466) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > at java.lang.Thread.run(Thread.java:745) > {code} > This is what seems to be happening: > # The revision GC creates brand new segments, and the standby instance starts > pulling them into its own store. > # When the standby sees an "in-memory" blobID, it decides that it doesn't > have this blob in its own blobstore, so it proceeds to ask for the bytes of > the blob from the primary, even though they are encoded in the ID itself. > # The longest blobID can be more than 8K in size (the 4K blob gets doubled by > hex encoding). When such a long blobID is submitted to the primary, the > request gets rejected because of excessive length. The secondary keeps > waiting until the request times out, and no progress is made in syncing. > The issue doesn't pop up with repositories that started as Segment-Tar since > Segment-Tar always inlines blobs below some hardcoded threshold (16K if I > remember correctly). > I think there could be multiple ways to approach this, not mutually exclusive: > * Special-case the "in-memory" BlobIDs during sidegrade and replace them with > the "native" segment values. If hardcoding knowledge about this > implementation detail isn't desired, there could be a new option for the > sidegrade process, to force "inlining" of blobs below a certain threshold, > even if they aren't in-line in the source repo. > * Special-case the "in-memory" BlobIDs in StandbyDiff so they aren't > requested from the primary, but are either kept as-is or get converted to the > "native" format. > * Increase the network package size limit in the sync protocol, or allow it > to be configured. This is the least efficient option, but with the least > impact on the code. > I can work on detailed reproduction steps if needed, but I'd rather not do it > beforehand because this is rather cumbersome to reproduce -- This message was sent by Atlassian JIRA (v7.6.3#76005)