Interesting results. Are you sure that you didn't accidentally serialize the 
Map twice or something? Or used the default serialization method for the keys 
or Map.Entry objects?

The fact that you got a lot lighter object by implementing Externalizable 
suggests that the default JDK serialization probably did something nasty with 
the Map even if it shouldn't have. The fact that Externalizable gives you a bit 
of overhead with the same serialization is probably because of some hidden 
externalization overhead (like SimplePrincipalCollection classname or 
something). It might be useful to hex dump the resulting byte array and see if 
there's anything useless there.

Not sure that protobuf really helps here: you'd need to enumerate the possible 
serializations (e.g. String, Long, UUID) in advance. Or, of course, if one 
wants to be really lightweight, one could achieve the same effect with a 
serialization format like:

BYTE dataType (0=string, 1=long, 2=UUID, -1=java class
UTF8 realmName
UTF8 className (exists only when dataType = -1)
BYTE[] externalizedData

That would be very tight (as it would replace "java.lang.String" with byte 
values etc) for the most common cases where the Principal is a common object, 
but it wouldn't work well if you were wrapping an UUID or something. But I 
think that might be overengineering already :-)

/Janne

On Dec 15, 2010, at 20:57 , Les Hazlewood wrote:

> Thanks Janne.
> 
> So I just tried a test with your SimplePrincipalSerializer's logic as
> the basis for SimplePrincipalCollection's writeObject/readObject
> implementations.
> 
> Here are the stats before this change was made (your original findings):
> 
> --------------------------
> Single principal, single realm
> Default serializer, Simple serializer, Size saving
>               423                100       76.36%
> 
> Multiple principals, single realm
> Default serializer, Simple serializer, Size saving
>               577                254       55.98%
> 
> Multiple principals, multiple realms
> Default serializer, Simple serializer, Size saving
>               817                368       54.96%
> -------------------------------
> 
> Here are the stats after moving SimplePrincipalSerializer's logic into
> SimplePrincipalCollection:
> 
> -------------------------------
> Single principal, single realm
> Default serializer, Simple serializer, Size saving
>                  434                     100        76.96%
> 
> Multiple principals, single realm
> Default serializer, Simple serializer, Size saving
>                  623                     254        59.23%
> 
> Multiple principals, multiple realms
> Default serializer, Simple serializer, Size saving
>                  977                     368        62.33%
> --------------------------------
> 
> Oddly enough, moving the relevant SimplePrincipalSerializer logic into
> SimplePrincipalCollection makes Java's default serialization mechanism
> _slower_ for the sample data set!  That means that the HashMap
> serialization implementation *when using default JDK object
> serialization* is more efficient than manually trying to serialize the
> map ourselves.
> 
> I hate Java serialization voodoo!
> 
> Time to see what happens when we implement Externalizable :)
> 
> I also think it'd be an interesting exercise to create a Serializer
> implementation based on Google's Protocol Buffers project (see:
> http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking)
> 
> Les
> 
> On Wed, Dec 15, 2010 at 4:35 AM, Janne Jalkanen
> <[email protected]> wrote:
>> 
>> SHIRO-226. Contains proposal patch against current trunk and corresponding 
>> unit tests.
>> 
>> /janne
>> 
>> On Dec 14, 2010, at 02:15 , Les Hazlewood wrote:
>> 
>>> I don't see how having multiple cookies solves the data size problem.
>>> In fact, I believe it makes the problem worse: instead of one cookie
>>> header, now you have multiple, contributing to an even greater overall
>>> request size.
>>> 
>>> Also, if a user stores more than one principal in the collection
>>> returned from a Realm, you can't just delete all but the primary
>>> principal without the user's knowledge - they might not have a way to
>>> reconstitute a Subject's identity properly - i.e. the primary
>>> principal might have auxiliary information necessary when accessing
>>> account data.  You have to assume that if they provide multiple
>>> principals, they actually need those to be retained for account lookup
>>> later.
>>> 
>>> My desire was to try to serialize the PrincipalCollection explicitly
>>> and not delegate to the HashMap instance - as Janne suggested also.  I
>>> think if we do that, we may very well find that we don't have to
>>> change anything because the data size will probably be within a
>>> suitable range.
>>> 
>>> It's certainly worth trying before we change anything else IMO.
>>> 
>>> Les
>>> 
>>> On Mon, Dec 13, 2010 at 3:02 PM, Kalle Korhonen
>>> <[email protected]> wrote:
>>>> On Mon, Dec 13, 2010 at 2:53 PM, Janne Jalkanen
>>>> <[email protected]> wrote:
>>>>> By using explicit serialization for things like realm names one should be 
>>>>> able to shave off a number of bytes *especially* for the very common 
>>>>> single-realm, single-principal case. It's a bit late over here, but I'll 
>>>>> try and see if I can generate some data or a patch tomorrow.
>>>> 
>>>> Great. Using the primary principal and a cookie per realm would make
>>>> this quite a bit more generic without loosing any of the benefits.
>>>> 
>>>> Kalle
>>>> 
>>>> 
>>>>> On Dec 13, 2010, at 22:25 , Les Hazlewood wrote:
>>>>> 
>>>>>> I think it is a good use case, but I think we may not be on the same 
>>>>>> page yet.
>>>>>> 
>>>>>> Unless I'm mistaken, the ID that Janne was talking about was a single
>>>>>> user or account id in his own application.  That corresponded to one
>>>>>> principal in one realm only.  I don't believe he was creating an ID
>>>>>> that was a pointer to the PrincipalCollection instance, for example.
>>>>>> 
>>>>>> So the question is: how do you efficiently represent a user's
>>>>>> rememberMe identity when that identity could span multiple realms, or
>>>>>> where there might be multiple principals, or a combination thereof?
>>>>>> 
>>>>>> Are you implying that we create a RememberMeDAO to save the
>>>>>> PrincipalCollection instance to a datastore (which will probably be
>>>>>> fronted transparently with a cache) and send out the record's ID only
>>>>>> in the cookie?  That sounds like an extremely complicated solution
>>>>>> since you'd have to come up with a purging strategy to handle orphan
>>>>>> records - it's almost like solving the Session problem over again.
>>>>>> 
>>>>>> My personal opinion is that I'd want to figure out a way to make the
>>>>>> serialization output size more compact before going down that road.
>>>>>> (It's something that should be done even if a DAO was used too).
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Les

Reply via email to