On Wed, Feb 16, 2011 at 5:08 PM, Christopher Schultz
<ch...@christopherschultz.net> wrote:
> On 2/15/2011 3:46 PM, Martin Grotzke wrote:
>> This is how it's implemented for memcached-session-manager (msm): when
>> loading the session from memcached (either due to a tomcat failover or
>> when loading a non-sticky session) the hash of the byte-array is
>> stored. At the end of the request the session is serialized again and
>> the new hash is compared with the previous one. Serialization is
>> pluggable, default is java/jvm serialization, alternatives are e.g.
>> kryo, which shares the same serialization semantics like java
>> serialization (ignoring transient fields etc.).
>
> Okay, so you *do* take data fingerprints. How well does that perform?
Serializing the session at the end of the request takes 1 to 10
millis. Actual hashing of some 10kB session data is in the order of 1
micro second. Storing the hash (int) - does not count.


> I suppose the balance of being able to replicate the objects with very
> little in the way of /coding/ overhead has its advantaged, but that
> seems like a lot of work to do after every request just to see if
> replication is necessary.
"A lot of work" is relative. In our case more work is done for each
request by the application (even without any I/O), so that
serialization does not add anything that would be reflected by a
higher load.


>
>> For msm by default a session would get replicated if at least one
>> attribute was accessed (via getAttribute), so I see this change
>> detection (or detection that state has not changed) as some kind of
>> optimization to reduce I/O and network traffic.
>
> Do you re-replicate the entire session or just the object that was
> accessed? Or, are you saying that you only re-fingerprint the objects
> that are accessed during a particular request?
As msm was started from a wicket app that has exactly one session
attribute the entire session is replicated.
But it would also be possible to break it up into separately stored
session attributes.
Only one thing has to be considered with this: objects shared by
different session attributes would be deserialized as different
objects (not references to the same object), which might break
behaviour if this is not considered. Serializing the entire session
ensures that references to shared objects are also deserialized
correctly.


>>>
>>> All of these techniques will kill performance. :(
>>
>> Well, what does this mean? IMHO such statements are not really useful.
>
> Serializing objects after a request is complete delays the return of the
> request processing thread to the thread pool. More threads will be
> required to handle a particular user load and performance across the
> webapp will suffer. Is that more specific and useful?
Request processing is delayed in the order of 1 milli second, I'd say
it's not that much - unless an application handles around 1000
requests per second with an avg. request processing time in the order
of 1 milli second. But this would be an interesting example for
optimization ;-)


> My assertion is
> that object changes are more easily detectable (or, better yet,
> /knowable/) by the webapp itself and it's trivial to notify the
> replication manager that an object needs to be re-replicated using
> HttpSession.setAttribute.
Of course there are applications working directly with session
attributes, but there are also others where it's not clear which
session attibute is responsible for a currently modified object (or
even if it's bound to the session). It depends on the application
architecture.


>> I can share some facts + numbers (out of my head):
>> - for one of the largest german ecommerce sites we use msm for session
>> replication/failover
>> - wicket is used as web framework, wicket stores entire pagemap
>> (pages/component trees) in the session (for stateful pages)
>> - kryo is used as serialization mechanism (see
>> http://code.google.com/p/kryo/ and
>> https://github.com/eishay/jvm-serializers/wiki).
>> - average (serialized) session size is between 10kB and 100kB
>> - session serialization/replication is done asynchronously (not in the
>> request thread)
>
> That's interesting. Is there one thread that manages all replication?
The number of available processors is used by default
(Runtime.availableProcessors()), but you can override this via
configuration.


> Under load, does it ever fall behind? I can imagine a work queue getting
> longer with each request...
We didn't experience such a behaviour in the last 6 months, also not
during peak load.


>> - overhead for msm in the request thread is ~1msec
>> - average session serialization time is s.th. between 1msec and 10msec
>> - ~20 requests per second are served by each tomcat instance
>>
>> In conclusion there's not a real impact on user experience. Most of
>> the cpu time is used by the application, the overhead for
>> serialization/replication is very low.
>
> I would think this would be very application-specific: of course a light
> session or trivially-serialized session-stored objects will not incur
> much of a penalty.
Yes, it's application specific. But I can asure you that wicket
sessions are not known to be very small :-) The 10 - 100kB was
serialized, compressed session size (uncompressed around 100 - 400kB).
I'd assume that most web apps have smaller sessions, but I'd be really
interested on statistics about apps and their session sizes :-)


Cheers,
Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to