Hello,
I'm totally new to Apache Jackrabbit/Oak, so I hope my question hasn't been
answered previously.
In order to learn how it works, I am trying to set up a simple JCR
repository which can store information about, let's say, books, to be
combined with a list of users who can mark books as read or starred.
The goal is then to be able to let a user list the books:
- she hasn't already read
- she has previously starred
Here's the model I first came up with:
/users
/user1
name (STRING)
/books
/book1
title (STRING)
readBy (REFERENCE) MULTIPLE
starredBy (REFERENCE) MULTIPLE
The "readBy" and "starredBy" properties would then contain references to
the user node UUID.
Query for unread books:
SELECT book.*
FROM [book:book] AS book
WHERE book.[book:readBy] <> '<user1-uuid>'
Query for starred books:
SELECT book.*
FROM [book:book] AS book
WHERE book.[book:starredBy] = '<user1-uuid>'
I don't really like mixing book information and user information on the
same node, so I thought of this 2nd model:
/users
/user1
name (STRING)
/books
/book1
title (STRING)
/userStatus
readBy (REFERENCE) MULTIPLE
starredBy (REFERENCE) MULTIPLE
That means introducing a JOIN in the queries.
Query for unread books:
SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] <> '<user1-uuid>'
Query for starred books:
SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE book.[book:starredBy] = '<user1-uuid>'
I then thought of a 3rd solution, using a LEFT OUTER JOIN for queries:
/users
/user1
name (STRING)
/userStatuses
/userStatus1
book (REFERENCE)
read (BOOLEAN)
starred (BOOLEAN)
/books
/book1
title (STRING)
Query for unread books:
SELECT book.*, userStatus.*
FROM [book:book] AS book
LEFT OUTER JOIN [book:userStatus] AS userStatus
ON userStatus.[book:book] = book.[jcr:uuid]
WHERE (ISCHILDNODE(userStatus, '/users/$userUuid/userStatuses') AND
userStatus.[book:read] = FALSE) OR userStatus.[book:read] IS NULL
Query for starred books:
SELECT book.*, userStatus.*
FROM [book:book] AS book
LEFT OUTER JOIN [book:userStatus] AS userStatus
ON userStatus.[book:book] = book.[jcr:uuid]
WHERE ISCHILDNODE(userStatus, '/users/$userUuid/userStatuses') AND
userStatus.[book:starred] = TRUE
But this model introduces an additional complexity due to userStatus nodes
being present in the tree only when a book is first marked as read or
starred by the user.
I have implemented model #2 with Oak 1.9.1 (i.e. unstable version, but I
thought that it was applicable for an experiment).
My goal is support 100k+ books and several hundred users, so I looked at
indexes in Oak.
Lucene indexes appeared more complicated to set up that property indexes,
and the synchronous aspect of the property indexes appealed to me actually.
Here's the Kotlin pseudocode that I used to initialize the repository:
@Bean
fun contentRepository() : ContentRepository {
val mongoDocumentNodeStoreBuilder =
MongoDocumentNodeStoreBuilder.newMongoDocumentNodeStoreBuilder()
.setMongoDB("mongodb://localhost:27017/", "books", 16)
return Jcr(Oak(mongoDocumentNodeStoreBuilder.build())
.with(RepositoryInitializer {
val oakIndexNodeBuilder = IndexUtils.getOrCreateOakIndex(it)
run {
val indexName = "book-userStatus_book-readBy"
if (!oakIndexNodeBuilder.hasChildNode(indexName)) {
IndexUtils.createIndexDefinition(oakIndexNodeBuilder, indexName, true,
false,
listOf("book:readBy"),
listOf("book:userStatus"))
}
}
it.nodeState
})).createRepository()
}
I see that Oak is building the index at startup, traversing almost 2
million nodes in my test case (the actual model is more extended than what
I presented here).
But when I run a query, it doesn't seem to use the index, given how long it
takes (and that it eventually times out):
2018-05-23 13:59:35.656 DEBUG 12937 --- [nio-8080-exec-8]
o.a.j.oak.query.QueryEngineImpl : Parsing JCR-SQL2 statement:
SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc'
2018-05-23 13:59:35.657 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost using filter
Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=*)
2018-05-23 13:59:35.657 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for reference is Infinity
2018-05-23 13:59:35.657 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for property is Infinity
2018-05-23 13:59:35.658 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for nodeType is 204704.0
2018-05-23 13:59:35.658 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for traverse is 1.0E8
2018-05-23 13:59:35.658 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost using filter
Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=*, property=[book:readBy=[60b02015-f09b-42a0-8a6a-d47d40194cfc]])
2018-05-23 13:59:35.658 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for reference is Infinity
2018-05-23 13:59:35.663 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for property is 3.0
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for nodeType is 102304.0
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for traverse is 1.0E8
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost using filter
Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=* && //parent/of/join)
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for reference is Infinity
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for property is Infinity
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for nodeType is 204704.0
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : cost for traverse is 1.0E8
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : query execute SELECT book.*,
userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc'
2018-05-23 13:59:35.664 DEBUG 12937 --- [nio-8080-exec-8]
o.apache.jackrabbit.oak.query.QueryImpl : query plan [book:userStatus] as
[userStatus] /* property book-userStatus_book-readBy =
60b02015-f09b-42a0-8a6a-d47d40194cfc where [userStatus].[book:readBy] =
'60b02015-f09b-42a0-8a6a-d47d40194cfc' */ inner join [book:book] as [book]
/* nodeType Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=* && //parent/of/join) */ on ischildnode([userStatus], [book])
2018-05-23 13:59:39.563 WARN 12937 --- [nio-8080-exec-8]
a.j.o.p.i.p.s.ContentMirrorStoreStrategy : Index-Traversed 10000 nodes
(2285 index entries) using index jcr:primaryType with filter
Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=/books/book1)
2018-05-23 13:59:43.467 WARN 12937 --- [nio-8080-exec-8]
a.j.o.p.i.p.s.ContentMirrorStoreStrategy : Index-Traversed 20000 nodes
(4259 index entries) using index jcr:primaryType with filter
Filter(query=SELECT book.*, userStatus.*
FROM [book:book] AS book
INNER JOIN [book:userStatus] AS userStatus
ON ISCHILDNODE(userStatus, book)
WHERE userStatus.[book:readBy] = '60b02015-f09b-42a0-8a6a-d47d40194cfc',
path=/books/book1)
...
So what am I doing wrong here?
Did I miss something obvious (in relation with JOINs in particular)?
Should I look into Lucene indexes? What will be the benefits / drawbacks in
my use case?
And what about my model by the way, does it make any sense to organize the
repository as I did?
Thanks a lot for your help!
--
Damiano Albani