Re: [Trac-dev] [git] persistent_cache

Peter Stuge Fri, 18 Aug 2017 08:12:36 -0700

Peter Suter wrote:
> > For background I started writing a new git backend for Trac based on
> > libgit2/pygit2 years ago, but didn't complete it because of the mismatch
> > between Trac repository datamodel requirements/expectations and the Git
> > data model. :\
> >
> > My WIP is still at http://git.stuge.se/?p=trac.git;a=commitdiff;h=ea5437b
> 
> Are you aware of / do you have any thoughts on TracPygit2Plugin?
> https://trac-hacks.org/wiki/TracPygit2Plugin
> https://trac.edgewall.org/ticket/10606


I wasn't aware. I think that development may have started after my work.

I haven't focused on this topic since then, so haven't seen it.

Thoughts - well I think that the aggressive caching of repo
information into the Trac database is fundamentally broken, and
that all repo plugin implementations will suffer from that.

I would expect TracPygit2Plugin to be better than my WIP, because Jun
is much more familiar with Trac internals, but I am unable to do a
proper comparison.


> I don't know Git or the Trac VCS model all that well.

The primary mismatch is that Trac expects to traverse commit
history equally easily in both directions, while Git only stores
history in reverse chronological order.

Another smaller point is that Git allows commit histories outside of
branches. A tag or even a plain commit id can also be the end point
for a series of commits.


> Just from reading 
> the source it seems that enabling the persistent_cache leads to storing 
> a reference to this (in-memory) revision cache in a Python class 
> variable (StorageFactory.__dict_nonweak).
> Presumably the goal is to reuse that cache across requests to improve 
> performance.

For anything but minimal repositories on private Trac instances, the
fact that git_fs spawns a git process is absolutely outrageous - let
alone the large number of processes that were spawned when I looked
at this.

I think the proper solution is to move some of what Trac tries to do
with metadata caching out of Trac and closer to the repo, and have
the Trac repo plugin take advantage both of pygit2 and of that
external parent/child index. Trac should already have access to all
needed information, it should not need to build caches, Trac is just
the wrong place for that, IMHO.


> So the trade-off would be better performance vs. higher memory usage 
> plus, as you pointed out, a higher risk for something to go wrong due to 
> the increased complexity.
> Also it only helps if the same Python interpreter is used for another 
> request (not the case in e.g. CGI), right?

Anything but the most trivial instances will use long-running
processes, typically either tracd or using WSGI.


> I'm not sure if it makes sense to use both this in-memory cache and 
> Trac's DB CachedRepository at the same time or if that's redundant and 
> pointless.
> Any thoughts on that?

I get the impression that the persistent_cache is stored, and not
just in-memory - but it is possible that I am wrong about this.


Thanks

//Peter

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/trac-dev.
For more options, visit https://groups.google.com/d/optout.

Re: [Trac-dev] [git] persistent_cache

Reply via email to