Hi Karol, The use of reference counters might be a good way around it. To make it backward compatible, I think we can optionally use the counters if the third map is present in the snapshot. Would it work?
I also think it would be good to create a jira for this so that we can track this discussion and propose patches. -Flavio > On 26 Feb 2015, at 13:13, Karol Dudzinski <karoldudzin...@gmail.com> wrote: > > Hi Flavio, > > We've done some more analysis using the snapshot formatter and a heap dump > and have found the source of the snapshot bloat. > > What is taking the majority of the space is the longKeyMap from DataTree. > In the heapdump, aclKeyMap has as many entries (which is to be expected given > how the maps are used) and is also taking an equally large amount of space > though at least aclKeyMap isn't serialised to the snapshot. > > We use a custom authentication provider but because the > AuthenticationProvider.matches method does not provide the path being > operated on, we end up sticking the path in the ACL id. Some of our apps end > up generating a lot of paths for one time use and consequently we end up with > lots of unique ACLs. > > The two ACL maps in DataTree seem to be an optimisation so that repeated > usage of ACLs does not result in the full list being stored multiple times. > However, these two maps are never removed from so if an ACL is unique these > maps (and the snapshot) grow forever. > > We're quite keen on fixing this as it's causing us lots of issues and we're > happy to provide a patch but will need your opinion on the various options: > - create a third map which would be a reference count for the ACLs which can > be updated as needed when creating, deleting or setting ACL. When the > reference count is 0, remove the entry from all the maps > - use weak references in some shape or form though this is made harder by the > fact that ACL optimisation essentially needs a bidirectional index (hence the > two maps). We've given this one lots of thought but it would really require > something like a ConcurrentWeakBiHashMap which just sounds wrong and over > engineered :) > > The other fix that could be made is to pass the path being operated on to the > AuthenticationProvider. However, doing that in a backwards compatible > fashion is not trivial and even though it would fix my problem (by allowing > me to remove the path from the ACL id) it wouldn't fix the general problem > with this optimisation. > > Looking forward to hearing your thoughts on this. > > Thanks, > Karol > >> On 22 Feb 2015, at 14:55, Flavio Junqueira <fpjunque...@yahoo.com.INVALID> >> wrote: >> >> Hi Karol, >> >> It's odd that you have such large snapshots and little data in the data >> tree. Are you creating lots of sessions? Right now I can't think of a good >> reason, I suggest you really use the snapshot formatter to inspect the >> snapshot. >> >> -Flavio >> >>> On 22 Feb 2015, at 14:23, Karol Dudzinski <karoldudzin...@gmail.com> wrote: >>> >>> Hi Flavio, >>> >>> Yes, one of ours clients had a bug which caused it to go into a >>> create/delete tight loop with zero net effect (I.e. It was deleting what it >>> had just created). After stopping the client, the snapshot never reduced in >>> size so are the deletes in there permanently? >>> >>> Thanks, >>> Karol >>> >>> >>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fpjunque...@yahoo.com.INVALID> >>>> wrote: >>>> >>>> Hi there, >>>> >>>> Perhaps a lot of data has been deleted? In any case, you may want to use >>>> the SnapshotFormatter to check what is in the large snapshot. >>>> >>>> -Flavio >>>> >>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <karoldudzin...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I was under the impression that the snapshot contained essentially an >>>>> on-disk copy of all the data. However, one of our clusters has a >>>>> snapshot which is over 1GB while the mntr four letter word reports an >>>>> approximate data size in the hundreds of KB and a node count in the low >>>>> thousands. So what else goes into the snapshot and how can I slim it >>>>> down? >>>>> >>>>> Thanks, >>>>> Karol >>