Re: Scalability

FolDeRol Fri, 13 Apr 2007 02:00:03 -0700

David,

Sounds optimistic. Some follow-up questions...


What are BundlePersistenceManagers? I could not find ones in source code of
JR 1.2.3.

It seems to me that synchronizing repository changes using a network file
system is not reliable and fast enough. How do you think, could
this approach be extended so to reuse JBoss/JGroups features of cluster-wide
object replication, and how difficult this might be? We would probably
contribute to implement this feature in case it is feasible.

Another question is in which cases Jackrabbit decides the indices are
inconsistent and should be rebuilt from the persistance storage? I did not
noted that this operation is performed every time the server starts. Is this
operation performed on the whole bunch of data or it can cover a specified
set of items? Once, my database was purged but local indices remained
intact, and I always saw warning messages on the console that nodes with
particular ids were not found. These messages continued to appear even after
restart of the server unless I deleted the indices too.

Regards

On 4/13/07, David Nuescheler <[EMAIL PROTECTED]> wrote:

Hi,

The good news first ;) :
Jackrabbit is designed to cluster a number of nodes backed by
a single RDBMS.
Please find more information on how to configure this here:

http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200611.mbox/[EMAIL 
PROTECTED]

I would also like to comment on your observations:

(1)  How the data is stored in the Database largely depends
on the persistence manager used. The
BundlePersistenceManagers (which are the ones that that I
would recommend for a bigger DB backed installation), store
the representation of a Node and its properties in a compressed
binary format in the database.

(2) To satisfy the requirements of a content repository as specified
by JCR, I think it is not possible to use just the database index
anyway. In particular for features like inheritance, fulltext or searching
unstructured information in a fine grained fashion.
This is why Jackrabbit (just like any other repository implementation
that I am aware of) keeps an additional index.
This additional index is synched through clustering and
does not need to be backed-up, since it can be rebuilt from
the information in the database in a recovery scenario.
So a Jackrabbit instance can be cloned or restored entirely
by just restoring the Database and supplying the repository.xml.

regards,
david

On 4/13/07, FolDeRol <[EMAIL PROTECTED]> wrote:
> Dear team,
>
> Could anybody clarify me the situation with Jackrabbit's scalability?
>
> We are considering Jackrabbit as a back-end for a large application with
> high level of data flow in a clustered environment. When I started the
> evaluation of Jackrabbit having read that it could employ an RDBMS as a
> persistance layer, I though that we could set up a number of cluster
nodes
> using Model 2 of deployment which would use the same logical instance
> (probably clustered) of the database and thus be scalable. I could not
find
> any details on this, and decided to learn the database schema and trace
JDBC
> calls so to estimate the performance.
>
> What was my wonder when I had known the truth. The data is stored in the
> RDBMS as a serialized Java objects and query operations are not handled
by
> the RDBMS at all but rather directly by the Jackrabbit engine on indices
> stored on the file system. Now, I'm seriously alarmed that Jackrabbit
might
> be inappropriate solution for our goal.
>
> Please someone confirm or deny my assumptions.
>
> Regards
>

Re: Scalability

Reply via email to