Author: baedke Date: Fri Sep 19 10:10:08 2014 New Revision: 1626168 URL: http://svn.apache.org/r1626168 Log: OAK-1915: TarMK Cold Standby
Improved MBeans, added documentation. Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/ jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/coldstandby.md jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/jmx/ClientFailoverStatusMBean.java Modified: jackrabbit/oak/trunk/oak-doc/src/site/site.xml jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/client/FailoverClient.java jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/BulkTest.java jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/MBeanTest.java Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/coldstandby.md URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/coldstandby.md?rev=1626168&view=auto ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/coldstandby.md (added) +++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/coldstandby/coldstandby.md Fri Sep 19 10:10:08 2014 @@ -0,0 +1,110 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +#Cold Standby + +### What is it? + +The *Cold Standby* feature allows one or more clients to connect to a master instance and ensure automatic on-the-fly synchronization of the repository state from the master to the client(s). The sync process is one-way only. Data stored on the master is never changed. The only purpose of this client installation(s) is to garantuee a (almost live) data copy and enable a quick switch from the master to a client installation without data loss. + +### What is isn't + +The *Cold Standby* feature does not garantuee file, filesystem or even repository **integrity**! If the content of a tar file is corrupted, a file is missing or anything similar happens to the locally stored files the installation will break because these situation or not checked, detected or treated! + +### How it works + +On the master a TCP port is opened and listening to incoming messages. Currently there a two messages implemented: + +* give me the segment id of the current head +* give me a segment data with a specified id + +The clients periodically request the segment id of the current head of the master. If the segment is locally unknown it will be retrieved. If it's already present the segments are compared and referenced segments (if necessary) will be requested, too. + + +### Prerequisites + +An Oak installation using a SegmentStore using the TarMK. + +### Setup + +1. Perform a filesystem based copy of the master repository. +2. on the master activate the feature by specifying the runmode <!-- TODO: this must be changed --> `syncmaster`. If the repository is running within a OSGI environment the feature will be activated by a corresponding configuration. <!-- TODO: add some OSGI specific info here --> +3. on the client(s) activate the feature by specifying the runmode `syncslave` (add additional parameters if desired) and specify the path to the repository (-tar) files +4. start the master and the client(s). + +You can add the additional argument `--secure true` if you like a SSL secured connection between the client and the master. It must be garantueed that **all** clients and the master either use secure or standard connections! A mixed configuration will definitely fail. + +The clients specify the master host using the `--host` (default is `localhost`) and `--port` (default is `8023`) arguments. For monitoring reasons (see below) the client(s) must be distinctable. Therefore a generic UUID is automatically created for each running client and this UUID is used to identify the client on the master. If you want to specify the name of the client you can set a system property `failOverID`. + +To sum it up a typical client command line could be: + + java -DfailOverID="Client#1" -jar oak-run.jar syncslave --secure false --host 192.168.0.1 crx-quickstart/repository/segmentstore + +<!-- TODO: add the master specific arguments (like the accepted incoming IP ranges) --> +The master can define the TCP port the feature is listening (default is `8023`) using the `--port` argument. If you want to restrict the communication you can specify a list of allowed IPs or IP ranges.... + +### Robustness + +The data flow is designed to detect and handle connection and network related problems automatically. All packets are bundled with checksums and as soon as problems with the connection or damaged packets occur retry mechanisms are triggered. + +### Monitoring + +The *Cold Standby* feature exposes informations using JMX/MBeans. Doing so you can inspect the current state of the client(s) and the master using standard tools like `jconsole` or `jmc` (if running JDK 1.7 or higher). The information can be found if you look for a `org.apache.jackrabbit.oak:type="FailOver"` MBean named `Status`. + +#####Client +Observing a client you will notice exactly one node (the id is either a generic UUID or the name specified by the `failOverID` system property). This node has three readonly attributes: + +* `Running`: boolean indicating whether the sync process is running +* `Mode`: always `Client: ` followed by the ID described above +* `Status`: a textual representation of the current state (like `running`, `stopped` and others) + +There are also two invokable methods: + +* `start()`: start the sync process +* `stop()`: stop the sync process + +#####Master +Observing the master exposes some general (non client-specific) informations via a MBean whose id value is the port number the `Cold Standby` service is using (usually `8023`). There are the same attributes and methods as described above but the values differ: + +* `Mode`: always the constant value `master` +* `Status`: has more values like `got message` + +Furthermore informations for each (up to 10) clients can be retrieved. The MBean id is the name of the client (see above). There are no invokable methods for these MBeans but some very useful readonly attributes: + +* `Name`: the id of the client +* `LastSeenTimestamp`: the timestamp of the last request in a textual representation +* `LastRequest`: the last request of the client +* `RemoteAddress`: the IP address of the client +* `RemotePort`: the (generic) port the client used for the last request +* `TransferredSegments`: the total number of segments transferred to this client +* `TransferredSegmentBytes`: the total number of bytes transferred to this client + +A typical state might look like this: +![Screenshot showing MBeans](mbeans.png) + +### Performance + +##### Master +Running on the master enabling the *Cold Standby* feature has almost no measurable impact on the performance. The additional CPU consumption is very low and the extra harddisk and network IO shouldn't have any drawbacks. + +##### Client +Things look differently on the client! During a sync process you can expect at least one CPU core running close to 100% for all the time. Due to the fact that the procedure is not multithreaded you can't speed up the process by using multiple cores. If no data is changed/transferred there will be no measurable activity. The expected throughput is about 700 KB / sec. Obviously this number will vary depending on the hardware and network environment but it does not depend on the size of the repository or whether you use SSL encryption or not. You should keep this in mind when estimating the time needed for an initial sync or when much data was changed in the meantime on the master node. + +### One word about security + +Assuming that the client(s) and the master run in the same intranet security zone there **should** be no security issue enabling the *Cold Standby* feature. Nevertheless you can add extra security by enabling SSL connections between the client(s) and the master (see above). Doing so reduces the possibility that the data is compromised by a man-in-the-middle. Furthermore you can specify the allowed client(s) by restricting the IP-address of incoming requests. This should help to garantuee that no one in the intranet can copy the repository (by accident). + Modified: jackrabbit/oak/trunk/oak-doc/src/site/site.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/site.xml?rev=1626168&r1=1626167&r2=1626168&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/site.xml (original) +++ jackrabbit/oak/trunk/oak-doc/src/site/site.xml Fri Sep 19 10:10:08 2014 @@ -51,6 +51,7 @@ under the License. <item href="differences.html" name="Differences to Jackrabbit 2" /> <item href="known_issues.html" name="Known Issues" /> <item href="dos_and_donts.html" name="Dos and don'ts" /> + <item href="coldstandby/coldstandby.html" name="Cold Standby" /> <item href="FAQ.html" name="FAQ" /> </menu> <menu name="Developing Oak"> Modified: jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/client/FailoverClient.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/client/FailoverClient.java?rev=1626168&r1=1626167&r2=1626168&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/client/FailoverClient.java (original) +++ jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/client/FailoverClient.java Fri Sep 19 10:10:08 2014 @@ -43,6 +43,7 @@ import java.util.concurrent.TimeUnit; import org.apache.jackrabbit.oak.plugins.segment.SegmentStore; import org.apache.jackrabbit.oak.plugins.segment.failover.CommunicationObserver; +import org.apache.jackrabbit.oak.plugins.segment.failover.jmx.ClientFailoverStatusMBean; import org.apache.jackrabbit.oak.plugins.segment.failover.jmx.FailoverStatusMBean; import org.apache.jackrabbit.oak.plugins.segment.failover.codec.RecordIdDecoder; import org.apache.jackrabbit.oak.plugins.segment.failover.store.FailoverStore; @@ -54,7 +55,7 @@ import javax.management.ObjectName; import javax.management.StandardMBean; import javax.net.ssl.SSLException; -public final class FailoverClient implements FailoverStatusMBean, Runnable, Closeable { +public final class FailoverClient implements ClientFailoverStatusMBean, Runnable, Closeable { public static final String CLIENT_ID_PROPERTY_NAME = "failOverID"; private static final Logger log = LoggerFactory @@ -72,6 +73,8 @@ public final class FailoverClient implem private SslContext sslContext; private boolean active = false; private boolean running; + private int failedRequests; + private long lastSuccessfulRequest; private volatile String state; private final Object sync = new Object(); @@ -81,6 +84,8 @@ public final class FailoverClient implem public FailoverClient(String host, int port, SegmentStore store, boolean secure) throws SSLException { this.state = STATUS_INITIALIZING; + this.lastSuccessfulRequest = -1; + this.failedRequests = 0; this.host = host; this.port = port; if (secure) { @@ -92,7 +97,7 @@ public final class FailoverClient implem final MBeanServer jmxServer = ManagementFactory.getPlatformMBeanServer(); try { - jmxServer.registerMBean(new StandardMBean(this, FailoverStatusMBean.class), new ObjectName(this.getMBeanName())); + jmxServer.registerMBean(new StandardMBean(this, ClientFailoverStatusMBean.class), new ObjectName(this.getMBeanName())); } catch (Exception e) { log.error("can register failover status mbean", e); @@ -171,7 +176,10 @@ public final class FailoverClient implem ChannelFuture f = b.connect(host, port).sync(); // Wait until the connection is closed. f.channel().closeFuture().sync(); + this.failedRequests = 0; + this.lastSuccessfulRequest = System.currentTimeMillis() / 1000; } catch (Exception e) { + this.failedRequests++; log.error("Failed synchronizing state.", e); stop(); } finally { @@ -207,4 +215,25 @@ public final class FailoverClient implem public String getStatus() { return this.state; } + + @Override + public int getFailedRequests() { + return this.failedRequests; + } + + @Override + public int getSecondsSinceLastSuccess() { + if (this.lastSuccessfulRequest < 0) return -1; + return (int)(System.currentTimeMillis() / 1000 - this.lastSuccessfulRequest); + } + + @Override + public int calcFailedRequests() { + return this.getFailedRequests(); + } + + @Override + public int calcSecondsSinceLastSuccess() { + return this.getSecondsSinceLastSuccess(); + } } Added: jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/jmx/ClientFailoverStatusMBean.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/jmx/ClientFailoverStatusMBean.java?rev=1626168&view=auto ============================================================================== --- jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/jmx/ClientFailoverStatusMBean.java (added) +++ jackrabbit/oak/trunk/oak-tarmk-failover/src/main/java/org/apache/jackrabbit/oak/plugins/segment/failover/jmx/ClientFailoverStatusMBean.java Fri Sep 19 10:10:08 2014 @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.jackrabbit.oak.plugins.segment.failover.jmx; + +import org.apache.jackrabbit.oak.commons.jmx.Description; + +public interface ClientFailoverStatusMBean extends FailoverStatusMBean { + + @Description("number of consecutive failed requests") + int getFailedRequests(); + + @Description("number of seconds since last successful request") + int getSecondsSinceLastSuccess(); + + // expose the informations as operations, too + + @Description("number of consecutive failed requests") + int calcFailedRequests(); + + @Description("number of seconds since last successful request") + int calcSecondsSinceLastSuccess(); + +} Modified: jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/BulkTest.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/BulkTest.java?rev=1626168&r1=1626167&r2=1626168&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/BulkTest.java (original) +++ jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/BulkTest.java Fri Sep 19 10:10:08 2014 @@ -18,7 +18,6 @@ */ package org.apache.jackrabbit.oak.plugins.segment.failover; -import junit.framework.Assert; import org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore; import org.apache.jackrabbit.oak.plugins.segment.failover.client.FailoverClient; import org.apache.jackrabbit.oak.plugins.segment.failover.jmx.FailoverStatusMBean; Modified: jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/MBeanTest.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/MBeanTest.java?rev=1626168&r1=1626167&r2=1626168&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/MBeanTest.java (original) +++ jackrabbit/oak/trunk/oak-tarmk-failover/src/test/java/org/apache/jackrabbit/oak/plugins/segment/failover/MBeanTest.java Fri Sep 19 10:10:08 2014 @@ -98,6 +98,9 @@ public class MBeanTest extends TestBase String m = jmxServer.getAttribute(status, "Mode").toString(); if (!m.startsWith("client: ")) fail("unexpected mode " + m); + assertEquals("1", jmxServer.getAttribute(status, "FailedRequests").toString()); + assertEquals("-1", jmxServer.getAttribute(status, "SecondsSinceLastSuccess").toString()); + assertEquals(FailoverStatusMBean.STATUS_STOPPED, jmxServer.getAttribute(status, "Status")); assertEquals(false, jmxServer.getAttribute(status, "Running")); @@ -125,6 +128,12 @@ public class MBeanTest extends TestBase try { assertTrue(jmxServer.isRegistered(status)); assertEquals("client: Foo", jmxServer.getAttribute(status, "Mode")); + + assertEquals("1", jmxServer.getAttribute(status, "FailedRequests").toString()); + assertEquals("-1", jmxServer.getAttribute(status, "SecondsSinceLastSuccess").toString()); + + assertEquals("1", jmxServer.invoke(status, "calcFailedRequests", null, null).toString()); + assertEquals("-1", jmxServer.invoke(status, "calcSecondsSinceLastSuccess", null, null).toString()); } finally { client.close(); } @@ -168,6 +177,18 @@ public class MBeanTest extends TestBase assertEquals(true, jmxServer.getAttribute(serverStatus, "Running")); assertEquals(true, jmxServer.getAttribute(clientStatus, "Running")); + assertEquals("0", jmxServer.getAttribute(clientStatus, "FailedRequests").toString()); + assertEquals("0", jmxServer.getAttribute(clientStatus, "SecondsSinceLastSuccess").toString()); + assertEquals("0", jmxServer.invoke(clientStatus, "calcFailedRequests", null, null).toString()); + assertEquals("0", jmxServer.invoke(clientStatus, "calcSecondsSinceLastSuccess", null, null).toString()); + + Thread.sleep(1000); + + assertEquals("0", jmxServer.getAttribute(clientStatus, "FailedRequests").toString()); + assertEquals("1", jmxServer.getAttribute(clientStatus, "SecondsSinceLastSuccess").toString()); + assertEquals("0", jmxServer.invoke(clientStatus, "calcFailedRequests", null, null).toString()); + assertEquals("1", jmxServer.invoke(clientStatus, "calcSecondsSinceLastSuccess", null, null).toString()); + assertEquals(new Long(2), jmxServer.getAttribute(connectionStatus, "TransferredSegments")); assertEquals(new Long(128), jmxServer.getAttribute(connectionStatus, "TransferredSegmentBytes"));