> Folks,
First of all, thanks for the detailed report :)
> In light of recent comments about slide running slowly when we have 100+
> documents, I thought I'd take a look into it. I'd say that there isn't a
lot
> we could do with indices in the database,
Really ? I thought query on the children and permissions tables (among
others) would get a big boost if they were indexed.
> but rather that we're being very
> inneficient in our SQL. I did some measurements and found that on doing a
> PROPFIND in a scope containing two text files, slide issued 61 select
> statements (approx 1000ms). With 32 files, a PROPFIND resulted in 438
> selects (4400ms). The startup time on a query is pretty significant and so
> we can expect this to have a dramatic effect with any number of documents.
>
> There is scope for "JOIN" to reduce the number of selects, but I suspect
> that this is not the best solution on its own.
>
> For example the sequence
>
> select * from objects where uri='/users/root/foo'
> select * from children where uri='/users/root/foo'
> select * from links where linkto='/users/root/foo'
> select * from permissions where object='/users/foo'
>
> select * from objects where uri='/users/root/bar'
> select * from children where uri='/users/roo/bar'
> select * from links where linkto='/users/root/bar'
> select * from permissions where object='/users/root/bar'
>
> [etc]
>
> accounts for a lot of the work.
>
> Surely a far better way to do it would be to perform a _single_ search on
a
> scope which returns a set of records corresponding to its children with
> fields along the lines of:
>
> uri classname linkcount permission
Ok. The thing is, I have no idea which particular db schema would yield the
best performance. This probably also depends on the db used.
The current implementation of a JDBC store is meant to be mostly a
"reference implementation", used as an example to develop new stores. I
expect the performance of JDBC stores tweaked for a particular database to
be a lot better.
Basically, I would suggest leaving the JDBC store as is, and work in any
optimised store in another package.
> there could be several of these for a uri, corresponding to different
> permissions (one per role for example), or better yet, compute the target
> before issuing the select, passing the role/person in question as a part
of
> the "WHERE" clause.
>
> Now... all this is really nice but is would involve a little more than
> simply reworking the schema. What I propose is that we extend the store's
> cache for the JDBC case and build it so that it gets populated by the
> children of the current node using a single search
The child store (here the JDBC store) is free to ovverride the default
caching when it makes sense. Here, I think it would be an excellent idea to
do that, since hopefully it would reduce the number of queries needed.
> Thinking initially about a depth=1 PROPFIND, if we work on the statistical
> likelihood of an initial request for a uri being followed by requests for
> its children, we can save time by dragging all of them from the store at
> once. To prevent the cache growing too fast, there is plenty of scope for
> reducing its size by altering the schema - the classname could be a
pointer
> to a lookup table and permissions could be expressed using lookup tables
as
> well. We might reasonably be able to expect to make a typical record
around
> 32 bytes.
>
> Now... for the situation where we're talking about recursive PROPFINDs I
can
> see a few ways to proceed - but let's say we pass the depth with the
initial
> request and have the store retrieve the entire relevant part of the tree.
> This is probably OK sizewise and is perhaps the simplest way to do it
> programmatically.
The SlideToken could hold a recommended caching policy field which would be
set by the client. The store could use it to fill out its cache in a more
intelligent fashion, knowing what's coming next.
> There is no need for testing for nodes being "clean" or "dirty" for now -
> that could be a future extension, but for now all we're talking about is
> condensing all the searches pertaining to a *single* PROPFIND command
> ("snapshot") into one and using a cache during the execution of it. We
could
> scrub the cache at the end of the PROPFIND for now and still ought to see
a
> significant performance boost.
Yes, this makes a lot of sense.
Remy