Lei Zhou wrote:
Thanks Marcel!
So it seems that due to the limitation of JCR (no aggregation query
support), it would be much slower to support this type of application than
RDBMS.
Is that a correct assessment?
An RDBMS certainly provides a wider range of operations through SQL than JCR
with the current set of XPath or SQL syntax. depending on your needs some of the
queries won't be possible in JCR but others will just be obsolete. E.g. in JCR
you don't have to execute a query to follow a reference you simply call the
method Property.getNode().
Also, to articulate, if I have to present to users with a query result
view that is categorized (or grouped) by ProductName, I'd have to do the
following:
1. Run query #1
//element(*, Document)[EMAIL PROTECTED] = 'Manual' and
jcr:contains(@description,
'maintenance')]
2. iterate through the entire RowIterator (may have thousands of
entries), use Java code
to create an aggregated ProductNames/ProductReference pairs collection
(since JCR doesn't have this type of query),
3. No "Order By" clause is used because the ProductReferences won't be in
same order as
the ProductNames, manual sorting is required in Java post-processing
The same can be achieved in one step:
//element(*, Document)[EMAIL PROTECTED] = 'Manual' and jcr:contains(@description,
'maintenance')]/jcr:deref(@ProductReference, *) order by @ProductName
this will return an ordered list of product names which contain matches.
4. Depending on which category has been selected by user to expand, run
query #2, limiting
results to that single product category:
(query #2)
//element(*, Document)[EMAIL PROTECTED] = 'Manual' and
jcr:contains(@description,
'maintenance') and @ProductReference = '<uuid-of-Product-#1>']
Correct.
5. Again, product names has to be de-referenced manually, and ordering has
to be moved from
the query to the java post-processing
This step I don't understand. What's the purpose of this step and why is it
needed? Isn't all information already available?
I'm fairly new to JCR and Jackrabbit. I've found them very helpful in many
aspects of managing contents. But I do feel that certains improvements
could make Jackrabbit a better choice for enterprise use.
#1. In the many years of enterprise application development, I've seen a
lot of our content based applications in need of support for complicated
search, e.g, search by arbitrary combination of document properties, and
grouping of search results (it is not uncommon to see 2, even 3 levels of
nested grouping).
-- Aggregations and Joins are definitely a big plus for querying a
complicated content model.
Such requirements are also discussed in the expert group of JSR 283. You can
comment on the current spec and post enhancement wishes to [EMAIL PROTECTED]
I've seen posts mentioning use of Node references to compensate the lack
of SQL Join, but what if I need to perform a search like below
(ProductNames, Regions and AvailableFors would most likely be categories
that are referenced by all documents):
FIND all manuals
THAT (ProductName is 'TV' or 'VCR' or 'DVD')
and (Region is 'North America' or 'Europe')
and (AvailableFor is 'distributor' or 'repairHouse')
GROUP BY Region, ProductName
such a query is certainly not possible with the current set of XPath or SQL in
JCR. You would have to break up the query into multiple queries. e.g. retrieve
uuids for produces with names 'TV', 'VCR' and 'DVD' and use those uuids in a
query. The same applies to Region and AvailableFor.
IMO XQuery would be a nice fit for those requirements.
#2. The RDBMS based repository, current DB schema is not very convincing
for large enterprise level applications. A more normalized schema might
help both performance and #1, but yes, more DB level code may be needed
(for performance's sake) and that may limit the portability of the
product.
I'm not sure that's really the case. Usually a normalized schema means less
performance. There were attempts to create a persistence manager using a
normalized schema, but in the end the currently used schema turned out to be the
most practical one.
regards
marcel