Aha!  My inexperience with all of this shows :)

Thank you for redirecting me!  I will try there.


John

________________________________
From: Julian Sedding <jsedd...@gmail.com>
Sent: Tuesday, November 8, 2016 12:11:46 PM
To: users@sling.apache.org
Subject: Re: Read cache contents being continuously sent from primary to cold 
standby?

Hi John

I think you might have more luck with this question on the jackrabbit
users list. Sling uses Jackrabbit Oak under the hood, but the real
expertise is over there.

Regards
Julian

On Tue, Nov 8, 2016 at 8:08 PM, John Logan <john.lo...@texture.com> wrote:
> Hi,
>
> Is there anyone who can provide guidance on a cold standby issue?
>
> It appears that my cold standby has made progress since my original
> email (below), but now I'm seeing a behavior that I didn't expect.
>
> The tarmk.log (below) for my cold standby shows that the head is being
> updated every 5-10 hours, with the longer intervals corresponding
> to 3pm to midnight local time.
>
> The primary's JMX standby metrics show that the number of transferred
> segments is not changing, while the number of transferred binaries
> continues to increase.
>
> I did some other checks on the read caches and the shared S3 bucket,
> and it looks like read cache contents are continually being
> transferred from the primary to the secondary, over and over
> again.  Is this what I should see?  If not, how do I troubleshoot
> this?
>
> Thanks!  John Logan
>
> tarmk.log:
>
> 2016-11-05 23:30:05,045 sending head request
> 2016-11-05 23:30:05,045 did send head request
> 2016-11-05 23:30:05,081 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-06 10:09:25,044 sending head request
> 2016-11-06 10:09:25,044 did send head request
> 2016-11-06 10:09:25,092 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-06 15:07:50,045 sending head request
> 2016-11-06 15:07:50,045 did send head request
> 2016-11-06 15:07:50,082 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-06 20:10:20,044 sending head request
> 2016-11-06 20:10:20,045 did send head request
> 2016-11-06 20:10:20,120 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-07 06:26:50,044 sending head request
> 2016-11-07 06:26:50,044 did send head request
> 2016-11-07 06:26:50,085 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-07 11:23:15,045 sending head request
> 2016-11-07 11:23:15,045 did send head request
> 2016-11-07 11:23:15,140 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-07 17:07:00,045 sending head request
> 2016-11-07 17:07:00,045 did send head request
> 2016-11-07 17:07:00,141 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-07 22:01:40,052 sending head request
> 2016-11-07 22:01:40,052 did send head request
> 2016-11-07 22:01:40,099 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-08 08:26:35,044 sending head request
> 2016-11-08 08:26:35,044 did send head request
> 2016-11-08 08:26:35,183 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
> 2016-11-08 14:05:50,046 sending head request
> 2016-11-08 14:05:50,046 did send head request
> 2016-11-08 14:05:50,159 updating current head to 
> 4cbe6d89-284c-4c4b-ac34-498068a9bcca.fe09
>
>
> On Wed, 2016-11-02 at 17:28 +0000, John Logan wrote:
>> Hi,
>>
>> I'm setting up a TarMK cold standby for a repository for the first time, and
>> have a couple of questions regarding synchronization and administration.
>> I've included the configuration and current dump of the primary and standby
>> MBeans below.  The primary and standby are in peered VPCs in AWS, using a
>> shared S3 bucket for blob storage.
>>
>> 1.) I'm curious as to how long I should expect to wait for the standby
>> to establish synchronization.  How much data gets moved over the wire?
>> I'm seeing a steady stream of read cache invalidations on the standby -
>> does this mean that all of the blob data must be transferred, even
>> though the two repositories use shared storage?
>>
>> 2.) I see in the logs a period where there are read cache invalidations,
>> and then there is a 12 hour period where nothing is logged, followed
>> by a 
>> "org.apache.jackrabbit.oak.plugins.segment.standby.client.SegmentLoaderHandler
>>  timeout"
>> message.  The quiet period is consistent with my setting
>> standby.readtimeout=I"43200000".  Would it make sense to choose a
>> shorter timeout to lessen the impact of occasional network issues?
>> At what point might the timeout value be "too short"?
>>
>> 3.) Is there a definitive way to know that the standby is synced?
>> The SyncEndTimestamp value below corresponds to 2016-11-02T09:26:18+00:00,
>> which corresponds exactly to the timestamp of the
>> "SegmentLoaderHandler timeout" message.  This suggests that this
>> value doesn't really tell me that the standby is synchronized.
>> When I tried with small repositories, it appears that synchronization
>> was done when the tarmk.log file started outputting the same repository
>> head every 5 seconds ("interval" setting).
>>
>> 4.) Assuming that the standby eventually becomes synchronized,
>> is there a documented procedure by which I could "split the mirror";
>> that is, convert the standby into an new, independent primary
>> containing a replica of the original?  If the current primary
>> and standby are referring to S3 bucket "P", could I shut down
>> both instances, copy the contents of bucket "P" to a new bucket
>> "S", update the standby Oak S3 configuration to refer to the new
>> bucket "S", and restart what was the standby as a new primary?
>> Are there other steps I would need to take?
>>
>> Thanks!  John
>>
>>
>> CONFIG VALUES FOR BOTH INSTANCES
>>
>>
>> STANDBY CONFIG:
>>
>>
>> /var/lib/sling/install/install.standby/org.apache.jackrabbit.oak.plugins.segment.standby.store.StandbyStoreService.config:
>> org.apache.sling.installer.configuration.persist=B"false"
>> port=I"8023"
>> secure=B"true"
>> mode="standby"
>> primary.host="john-proto.dev"
>> interval=I"5"
>> standby.readtimeout=I"43200000"
>>
>>
>> PRIMARY CONFIG:
>>
>>
>> /var/lib/sling/install/install.primary/org.apache.jackrabbit.oak.plugins.segment.standby.store.StandbyStoreService.config
>> org.apache.sling.installer.configuration.persist=B"false"
>> port=I"8023"
>> secure=B"true"
>> mode="primary"
>> primary.allowed-client-ip-ranges=["0.0.0.0-255.255.255.255"]
>>
>>
>> OAK S3 CONFIG:
>>
>>
>> /var/lib/sling/install/oak_s3/org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore.config:
>> accessKey=""
>> secretKey=""
>> s3Bucket="my-primary-bucket"
>> s3Region="us-west-2"
>> s3EndPoint="s3-us-west-2.amazonaws.com"
>> connectionTimeout="120000"
>> socketTimeout="120000"
>> maxConnections="40"
>> writeThreads="30"
>> maxErrorRetry="10"
>>
>>
>> JMX MBEANS
>>
>>
>> STANDBY:
>>
>>
>> #mbean = 
>> org.apache.jackrabbit.oak:id="fa2b9a7c-fc69-4a0c-aa7e-b0cfc61bd1c6",name=Status,type="Standby":
>> FailedRequests = 0;
>>
>> SecondsSinceLastSuccess = 24269;
>>
>> SyncStartTimestamp = 1478021232280;
>>
>> SyncEndTimestamp = 1478078778813;
>>
>> Status = running;
>>
>> Running = true;
>>
>> Mode = client: fa2b9a7c-fc69-4a0c-aa7e-b0cfc61bd1c6;
>>
>>
>> PRIMARY:
>>
>> #mbean = org.apache.jackrabbit.oak:id=8023,name=Status,type="Standby":
>> Status = got message;
>>
>> Running = true;
>>
>> Mode = primary;
>>
>> #mbean = org.apache.jackrabbit.oak:id="Client 
>> fa2b9a7c-fc69-4a0c-aa7e-b0cfc61bd1c6",name=Status,type="Standby":
>> RemotePort = 44322;
>>
>> RemoteAddress = 10.16.12.44;
>>
>> LastSeenTimestamp = Wed Nov 02 13:48:59 UTC 2016;
>>
>> TransferredSegments = 186780;
>>
>> TransferredSegmentBytes = 1198693232;
>>
>> TransferredBinaries = 5579;
>>
>> TransferredBinariesBytes = 170312256398;
>>
>> LastRequest = b.678851bb77bec68db82c6bda37aca8e763d8a32e#655084301;
>>
>> Name = fa2b9a7c-fc69-4a0c-aa7e-b0cfc61bd1c6;
>
>

Reply via email to