Re: Session Clustering Monitoring

Christopher Schultz Tue, 13 Jan 2015 11:08:26 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Peter,


On 1/13/15 1:10 PM, Peter Rifel wrote:
> On 1/13/15, 6:32 AM, "Christopher Schultz"
> <ch...@christopherschultz.net> wrote: On 1/12/15 4:32 PM, Peter
> Rifel wrote:
>>>> On 1/12/15, 11:36 AM, "Christopher Schultz" 
>>>> <ch...@christopherschultz.net> wrote: On 1/12/15 2:28 PM,
>>>> Peter Rifel wrote:
>>>>>>> Chris,
>>>>>>> 
>>>>>>> On 1/12/15, 11:08 AM, "Christopher Schultz" 
>>>>>>> <ch...@christopherschultz.net> wrote:
>>>>>>> 
>>>>>>> Peter,
>>>>>>> 
>>>>>>> On 1/12/15 12:51 PM, Peter Rifel wrote:
>>>>>>>>>> I'm running Tomcat 8.0.15 with Java 1.8.0_25 on
>>>>>>>>>> Ubuntu 14.04. We have 5 instances that are all
>>>>>>>>>> setup with session clustering as follows:
>>>>>>>>>> 
>>>>>>>>>> <Cluster 
>>>>>>>>>> className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
>>>>>>>>>>
>>>>>>>>>>
>
>>>>>>>>>> 
<Manager
>>>>>>>>>> className="org.apache.catalina.ha.session.DeltaManager"
>>>>>>>>>>
>>>>>>>>>>
>
>>>>>>>>>> 
stateTransferTimeout="5" /> <Channel
>>>>>>>>>> className="org.apache.catalina.tribes.group.GroupChannel">
>>>>>>>>>>
>>>>>>>>>>
>
>>>>>>>>>> 
<Membership
>>>>>>>>>> className="org.apache.catalina.tribes.membership.McastService"
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>>>>>>>>
>
>>>>>>>>>> 
address="${multicast}" /> </Channel> </Cluster>
>>>>>>>>>> 
>>>>>>>>>> -Dmulticast=228.0.0.4
>>>>>>>>>> 
>>>>>>>>>> To help prevent accidental misconfigurations that
>>>>>>>>>> have occurred in the past, I decided to implement
>>>>>>>>>> monitoring on the session replication by checking
>>>>>>>>>> the JMX mbean 
>>>>>>>>>> Catalina/Manager/<host>/<context>/activeSessions 
>>>>>>>>>> attribute. Most of the time the values for the 5 
>>>>>>>>>> instances are all within 1 or 2 of each other.
>>>>>>>>>> Over the weekend we consistently had one instance
>>>>>>>>>> that had more sessions than the other 4. It began
>>>>>>>>>> with 102 sessions where every other instance had
>>>>>>>>>> 95. Over the next 36 hours as more sessions were
>>>>>>>>>> expiring over the weekend, the difference grew to
>>>>>>>>>> 49 vs 29. Eventually it resynced and now they all
>>>>>>>>>> report the same active session count. My question
>>>>>>>>>> is, does anyone know why this would happen, and
>>>>>>>>>> if this can be expected is there a better way to 
>>>>>>>>>> monitor session replication to ensure that there
>>>>>>>>>> isn't one instance that isn't being replicated
>>>>>>>>>> to? I believe this only happens on weekends when
>>>>>>>>>> most sessions are expiring and very few are being
>>>>>>>>>> created but I may be wrong.
>>>>>>> 
>>>>>>> How is your load-balancer configured to distribute
>>>>>>> traffic?
>>>>>>> 
>>>>>>>> Two of the instances are behind one load balancer,
>>>>>>>> and the other 3 are behind another.  They each
>>>>>>>> provide a different service but are running the same
>>>>>>>> war application and we want sessions clustered across
>>>>>>>> both services. Each load balancer's initial
>>>>>>>> distribution is based on the least number of
>>>>>>>> connections, with persistence based on source IP.
>>>> 
>>>> So basically all requests are randomly sent to back-end
>>>> nodes? Or are you using session stickiness or anything like
>>>> that?
>>>> 
>>>>> Sorry, I should have clarified.  Stickiness is based on
>>>>> the source ip, so requests from the same IP will be routed
>>>>> to the same instance.  With these applications we don't
>>>>> expect sessions to change Ips very often if at all, but if
>>>>> you think it would help I could stick based on the
>>>>> JSESSIONID cookie.
> 
> I was wondering, because there is an unfortunately situation with 
> session stickiness and long-lived clients where fail-over can cause
> a large number of clients to switch to a particular server and
> /stay there/ even if they have to re-login (when you'd likely
> prefer that they get re-balanced to another node if they have to
> re-login).
> 
> If you had a temporary failure of one node, perhaps the clients
> were re-assigned to another node and they "stayed there" when the
> failing node became available again. Those clients would stay there
> until either their newly-assigned node failed, or they closed their
> browser (assuming they are using cookies that live for the life of
> the client, like JSESSIONIDs typically do).
> 
> After the weekend in your scenario, perhaps all your users
> restarted their browsers (or computers) and thus were re-balanced.
> 
>> But shouldn't all sessions be replicated regardless of how
>> stickiness is (or isn't) setup?

Yes, they should. I was just considering a possible failure scenario.
With completely distributable sessions and the DeltaManager, the whole
cluster should stay in sync.

>> I assume that the DeltaManager only replicates changes to the
>> state of all sessions since its last replication.  Maybe if there
>> was a hiccup in replication by one tomcat instance, the sessions
>> that it created or extended would not have gotten replicated and
>> then never made their way to the other instances until they
>> expire on their own.

If this happened, you should be able to see some indication in the log
files.

>> If this is the case, I may try and come up with a better method
>> for monitoring the clustering state, unless anyone has any
>> suggestions on how to fix this supposed hiccup.

It seems reasonable to expect that the session count would be
consistent across the cluster, at least over time (obviously, it could
take a few seconds for the cluster to become consistent if you are
watching very carefully).

The logs contain no clues?

How many nodes are in the cluster? 5? I know that DeltaManager's
performance degrades as the number of nodes increases, but I think
it's a linear performance drop, as opposed to anything polynomial. I'm
not sure if 5 nodes ends up being the breaking point, especially given
your load profile (which you didn't specify).

When you get nodes out of sync, can you probe them to find out if
those "extra" sessions might be expired, but not yet cleaned-up?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCAAGBQJUtWy+AAoJEBzwKT+lPKRYg9kQAL04b/noYkqhhGJ+dznl5njP
nXX0VSCoCEF0+tEIGMvwS2k3MWJdxIlqPvpLnj2efompRmICB7cvVAjYF99xX6g9
7SqV3TQ9eSNC3FzTPZomUolbSph7XS1eus3BEauRJ90H2ZaokeczSPDRPK/Gzsoq
5+A9B22AcAspQOckZ+FaG5Qw/nwNeH+woksnPP2gCsgD85Rgs2PSlxJ+C8bF2VZ7
PbXnDCBTh2dSPnFWb/E3tHVMb9lICz6/T/oVswnGJNuxwHfoceuWxnzlDisTU4jj
LCNqg65gCOcscqLooS/L2t4znWlsnKAVsCmxOLYutOlZgLtnMhVXPqi7u1sRbZV6
SL+3HSOoT6hpI2/w80RVxdPtVosLx0NpYPEq0Kp/Bv/4iXdwGyxtqJ6biTppyCe6
US6Se7QyOt4lZOG0Bgp6nwx657sgAZUTBqN2JNFmVizx89Lh2A7r6fN6FoPnMxWH
6ab4qTjPGi8C0TsndY4ei7IDyEF14/JhNgJcpstt37KTJkhuECHZng0vaP7mt0rG
f27w2fR2UQtRMkAE3ydKgGUSrOTybai6Y6Z4R4pOoKQl3lilwAMX/PjPODAWkc6T
RC9upIneDXdWB2eZzSfIr9v06JV/GT5fHjIn0eV1CtFs+95zYg4Jtn9+obTte08y
WnYTGW5w9PC69+csaVcz
=pPm4
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Session Clustering Monitoring

Reply via email to