John, CacheServer isRunning is not a reliable way to determine whether the CacheServer acceptor is actually listening for connections. BridgeServerImpl isRunning (the implementation) asks if the Acceptor is non-null and isRunning, which in turn just asks whether it (the Acceptor) is not shutdown. The CacheServer isRunning could be true before the Acceptor is listening for connections.
You could possibly use a ClientMembershipListener. If you install one in your client, the memberJoined callback will tell you when the client connects to the server. This will more-or-less do what your custom socket code is doing now. It doesn't necessarily tell you all the servers though - only the ones that the client has connected to. There is another option to see all the servers. It only works if you have a locator, and it uses some java public (but not Geode public) API. This API can be used by the client to determine how many servers there are and their locations. I can point you to that if you're interested. Barry Oglesby GemFire Advanced Customer Engineering (ACE) For immediate support please contact Pivotal Support at http://support.pivotal.io/ On Wed, Jan 20, 2016 at 3:28 PM, John Blum <[email protected]> wrote: > Is there a recommended, (more) reliable means to determine whether a > CacheServer (listening for cache clients) has successfully started in a > GemFire server from the client-side? > > Currently, I am employing a form of inter-process communication (e.g. > control file) to coordinate the successful startup and general readiness of > a server before a client cache attempts to connect inside an integration > test. > > In this case, the test acts as the cache client and connects to the > server, but not before forking a GemFire server process during setup, and > ideally not before the server is ready (and specifically, not until > ServerSocket is "accepting" connections). > > For the most part, this works fairly consistently, except there exists > potential timing issues in the test for server readiness (and specifically, > CacheServer listening for connections), particularly on the server before > writing the control file. For example, I have included this code block... > > assertThat(*waitOnCondition*(new Condition() { > @Override public boolean evaluate() { > * return gemfireCacheServer.isRunning();* > } > }), is(true)); > > writeProcessControlFile(WORKING_DIRECTORY); > > The client (i.e. test) then checks for the presence of this control file > before executing the tests. > > The waitOnCondition(:Condition) method (see below) functions properly, > waiting on the condition for a specified duration (defaults to 20 seconds), > checking every 500 ms. However, it would seem CacheServer.isRunning() > <http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/server/CacheServer.html#isRunning()> > [0] can > potentially return *true* before the ServerSocket listening for client > connections is actually "accepting" connections. It is less than clear > from the Javadoc, (and thus, the user's POV) what CacheServer.isRunning() > actually does (without having to dig into code). > > So, I thought, perhaps a more reliable means to determine whether the > server is actually ready, listening for and accepting connections, would be > to just open a Socket connection on the client. If I can connect, then > the server is presumably ready. So, I coded... > > boolean waitForCacheServerToStart(final String host, > final int port, long duration) { > return *waitOnCondition*(new Condition() { > AtomicBoolean connected = new AtomicBoolean(false); > > public boolean evaluate() { > Socket socket = null; > > try { > // NOTE: the following code is not meant to be an atomic, > compound action (a possible race condition) > // opening another connection (at the expense of using system > resources) after connectivity > // has already been established is not detrimental in this > use case > if (!connected.get()) { > * socket = new Socket(host, port);* > connected.set(true); > } > } > catch (IOException ignore) { > } > finally { > GemFireUtils.close(socket); > } > > return connected.get(); > } > }, duration); > } > > This seems to work OK, though, since I turn around and close the > connection right of way, before completing the "handshake", Geode throws... > > [warn 2016/01/20 14:12:42.599 PST <Handshaker localhost/127.0.0.1:12480 > Thread 0> tid=0x22] Bridge server: failed accepting client connection {0} > java.io.EOFException > at > > com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl.handleNewClientConnection(AcceptorImpl.java:1508) > at > com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$5.run(AcceptorImpl.java:1391) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > There really does not appear to be a better way using the Geode API, and > in particular, the PoolFactory > <http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html> > [1], > to set, say, a *retryConnectionTimeout* property along with a > *retryConnectionAttempts* property when populating the pool with > connections, at least initially during startup, or even when adding more > connections to the pool (up to the "max") during heavier loads, unlike > similar properties for read/requests operations... setReadTimeout(:int) > <http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html#setReadTimeout(int)> > [2] > and setRetryAttempts(:int) > <http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html#setRetryAttempts(int)> > [3]. > > Am I missing anything? Other ideas/recommendations? > > Thanks, > -John > > [0] - > http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/server/CacheServer.html#isRunning() > [1] - > http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html > [2] - > http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html#setReadTimeout(int) > [3] - > http://gemfire.docs.pivotal.io/docs-gemfire/latest/javadocs/japi/com/gemstone/gemfire/cache/client/PoolFactory.html#setRetryAttempts(int) > > > P.S. code for waitOnCondition(..) for the curious minded, ;-) > > static final long DEFAULT_WAIT_DURATION = TimeUnit.SECONDS.toMillis(20); > static final long DEFAULT_WAIT_INTERVAL = 500l; > > @SuppressWarnings("unused") > boolean waitOnCondition(Condition condition) { > return waitOnCondition(condition, DEFAULT_WAIT_DURATION); > } > > @SuppressWarnings("all") > boolean waitOnCondition(Condition condition, long duration) { > final long timeout = (System.currentTimeMillis() + duration); > > try { > while (!condition.evaluate() && System.currentTimeMillis() < > timeout) { > synchronized (condition) { > TimeUnit.MILLISECONDS.timedWait(condition, > DEFAULT_WAIT_INTERVAL); > } > } > } > catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > > return condition.evaluate(); > } > >
