Hi Karol,

The use of reference counters might be a good way around it. To make it 
backward compatible, I think we can optionally use the counters if the third 
map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track 
this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <karoldudzin...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump 
> and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  
> In the heapdump, aclKeyMap has as many entries (which is to be expected given 
> how the maps are used) and is also taking an equally large amount of space 
> though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the 
> AuthenticationProvider.matches method does not provide the path being 
> operated on, we end up sticking the path in the ACL id.  Some of our apps end 
> up generating a lot of paths for one time use and consequently we end up with 
> lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated 
> usage of ACLs does not result in the full list being stored multiple times.  
> However, these two maps are never removed from so if an ACL is unique these 
> maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're 
> happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can 
> be updated as needed when creating, deleting or setting ACL.  When the 
> reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the 
> fact that ACL optimisation essentially needs a bidirectional index (hence the 
> two maps).  We've given this one lots of thought but it would really require 
> something like a ConcurrentWeakBiHashMap which just sounds wrong and over 
> engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the 
> AuthenticationProvider.  However, doing that in a backwards compatible 
> fashion is not trivial and even though it would fix my problem (by allowing 
> me to remove the path from the ACL id) it wouldn't fix the general problem 
> with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <fpjunque...@yahoo.com.INVALID> 
>> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data 
>> tree. Are you creating lots of sessions? Right now I can't think of a good 
>> reason, I suggest you really use the snapshot formatter to inspect the 
>> snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <karoldudzin...@gmail.com> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a 
>>> create/delete tight loop with zero net effect (I.e. It was deleting what it 
>>> had just created). After stopping the client, the snapshot never reduced in 
>>> size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fpjunque...@yahoo.com.INVALID> 
>>>> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use 
>>>> the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <karoldudzin...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an 
>>>>> on-disk copy of all the data.  However, one of our clusters has a 
>>>>> snapshot which is over 1GB while the mntr four letter word reports an 
>>>>> approximate data size in the hundreds of KB and a node count in the low 
>>>>> thousands.  So what else goes into the snapshot and how can I slim it 
>>>>> down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 

Reply via email to