David, Sounds optimistic. Some follow-up questions...
What are BundlePersistenceManagers? I could not find ones in source code of JR 1.2.3. It seems to me that synchronizing repository changes using a network file system is not reliable and fast enough. How do you think, could this approach be extended so to reuse JBoss/JGroups features of cluster-wide object replication, and how difficult this might be? We would probably contribute to implement this feature in case it is feasible. Another question is in which cases Jackrabbit decides the indices are inconsistent and should be rebuilt from the persistance storage? I did not noted that this operation is performed every time the server starts. Is this operation performed on the whole bunch of data or it can cover a specified set of items? Once, my database was purged but local indices remained intact, and I always saw warning messages on the console that nodes with particular ids were not found. These messages continued to appear even after restart of the server unless I deleted the indices too. Regards On 4/13/07, David Nuescheler <[EMAIL PROTECTED]> wrote:
Hi, The good news first ;) : Jackrabbit is designed to cluster a number of nodes backed by a single RDBMS. Please find more information on how to configure this here: http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200611.mbox/[EMAIL PROTECTED] I would also like to comment on your observations: (1) How the data is stored in the Database largely depends on the persistence manager used. The BundlePersistenceManagers (which are the ones that that I would recommend for a bigger DB backed installation), store the representation of a Node and its properties in a compressed binary format in the database. (2) To satisfy the requirements of a content repository as specified by JCR, I think it is not possible to use just the database index anyway. In particular for features like inheritance, fulltext or searching unstructured information in a fine grained fashion. This is why Jackrabbit (just like any other repository implementation that I am aware of) keeps an additional index. This additional index is synched through clustering and does not need to be backed-up, since it can be rebuilt from the information in the database in a recovery scenario. So a Jackrabbit instance can be cloned or restored entirely by just restoring the Database and supplying the repository.xml. regards, david On 4/13/07, FolDeRol <[EMAIL PROTECTED]> wrote: > Dear team, > > Could anybody clarify me the situation with Jackrabbit's scalability? > > We are considering Jackrabbit as a back-end for a large application with > high level of data flow in a clustered environment. When I started the > evaluation of Jackrabbit having read that it could employ an RDBMS as a > persistance layer, I though that we could set up a number of cluster nodes > using Model 2 of deployment which would use the same logical instance > (probably clustered) of the database and thus be scalable. I could not find > any details on this, and decided to learn the database schema and trace JDBC > calls so to estimate the performance. > > What was my wonder when I had known the truth. The data is stored in the > RDBMS as a serialized Java objects and query operations are not handled by > the RDBMS at all but rather directly by the Jackrabbit engine on indices > stored on the file system. Now, I'm seriously alarmed that Jackrabbit might > be inappropriate solution for our goal. > > Please someone confirm or deny my assumptions. > > Regards >
