JCR 1.0 SQL queries with JackRabbit

Øyvind Stegard Wed, 12 Mar 2008 12:57:50 -0700

Hello,

We are currently in the process of evaluating JackRabbit as a backend
content repository for our CMS, using the JCR 1.0 API (JackRabbit is
version 1.4.1).


I have some questions regarding repository queries. I am using the SQL
syntax.


1) 
I am having difficulties with some queries which are based on the
jcr:path pseudo property:

Descendant or self (similar example found on page 305 of the JCR 1.0
spec):
SELECT * FROM nt:base WHERE jcr:path = '/foo' OR jcr:path LIKE '/foo/%'

Here I get the following error: Invalid combination of jcr:path clauses

The problem seems to be general, in that I am unable to combine multiple
jcr:path clauses in a single query (with a few exceptions, see question
2). Even a simple expression like "jcr:path = '/foo' OR jcr:path =
'/bar'" fails.

If this is a known limitation in JackRabbit 1.4, will such queries be
possible in JCR 2.0, using the SQL2 syntax or QOM ? I would certainly
prefer using a qom directly.


2)
Is there any feasible way of querying for path depth using the jcr:path
pseudo property ? When talking about "path depth", I am only concerned
about absolute node paths, where "/" has depth 0, "/foo" has depth 1 and
so on. It is the same as the depth concept in
org.apache.jackrabbit.spi.Path#getDepth(), but only for absolute paths. 

For instance, for a query that lists file/folder nodes at exactly depth
1, i.e. the children of the root node, one would think something like
this works**:

SELECT * FROM nt:base WHERE (jcr:primaryType = 'nt:file' OR
jcr:primaryType = 'nt:folder') AND jcr:path LIKE '/%' AND NOT jcr:path
LIKE '/%/%'

**nt:file and nt:folder only used as example, in reality we have our own
custom types representing the file/folder concept.

And in fact, this particular query works as expected (even though it
contains multiple jcr:path clauses). It is just like the "Child" SQL
query example on page 305 of the JCR 1.0 spec.

Generalising this approach for depth-N, however, seems to not work, for
instance for depth 2:

SELECT * FROM nt:base WHERE (jcr:primaryType = 'nt:file' OR
jcr:primaryType = 'nt:folder') AND jcr:path LIKE '/%/%' AND NOT jcr:path
LIKE '/%/%/%'

This one returns a result set containing every nt:file/nt:folder-node in
the repository with depth greater than 1 (we only want exactly 2). Is
this kind of query possible at all when based on jcr:path ?

I could add an integer property containing the path depth for every node
which represents a resource in our CMS, and use that property in
queries. But that property value would be a direct function of jcr:path,
and would need updating whenever jcr:path changes, that's something I do
*not* want to do.

Supporting path depth clauses in queries might be considered somewhat
eccentric, but it is something we currently support in our CMS with our
own backend query implementation. Some general background information
about our CMS is provided at the end of this email.


3)
It seems that logical inversion of LIKE queries on jcr:path is ignored
by JackRabbit.

Two examples:

SELECT * FROM nt:base WHERE jcr:path NOT LIKE '/foo/%'
and
SELECT * FROM nt:base WHERE NOT (jcr:path LIKE '/foo/%')

returns exactly the same result set as:

SELECT * FROM nt:base WHERE jcr:path LIKE '/foo/%'

So the "NOT" part seems to be completely ignored. I suspect this
limitation is because of performance-concerns and the Lucene-based query
backend ?


4)
It seems that "NOT LIKE .." generally does not invert matching for any
kind of JCR property ? However, I could work around that by wrapping a
regular "LIKE" inside a NOT(...) expression, i.e. "NOT (foo:bar LIKE
'baz%')". Perhaps that's not very wise, performance-wise (pun
intended) ?


Am I right in assuming that many of these limitations are present
because of the relation to XPath, and mapping of SQL queries to an
XPath-based query model internally in JackRabbit 1.4 ?



Some background information
Our CMS uses a strictly hierarchical resource model, with
file-resources, collection-resources and properties. It is implemented
using the Spring framework and provides normal web access, in addition
to access through the WebDAV protocol. At the bottom, we have our own
repository/backend implementation which handles persistence of all
resources and a custom Lucene-based search/query implementation. This is
where JCR comes into the picture, we are experimenting with using JCR
for both of these tasks (persistence and query). 

We have our own query language, parser and query object model. I am
currently working on mapping this to equivalent JCR 1.0 SQL syntax for
our experimental JCR-based backend. In our language, we have support for
a few constructs which are based around URIs (which is the hierarchical
identifier we use for all resources in our CMS). These URIs are directly
mapped to corresponding JCR node paths, hence all the questions about
queries on jcr:path :). Our own Lucene-based query implementation has
special fields for optimizing the performance of queries which are based
on the hierarchy of resources. 

What should we expect of support for these types of queries in future
versions of JackRabbit, JCR 2.0, etc ?

Thanks for any answers in advance :)

Regards,
Øyvind Stegard
IT-department, University of Oslo, Norway
-- 
< Øyvind Stegard ~ oyvind stegard at usit uio no
 < USIT, UiO

JCR 1.0 SQL queries with JackRabbit

Reply via email to