Hi,

We have a Hippo CMS deployment, where Jackrabbit is configured using a
database journal.
I'm looking at switching to a file journal on NFS (EC2 EFS), and was
referred here from the Hippo mailing list.

I've got the file journal configuration working, and can import a
small amount of content (say, 50 news articles) at a reasonable speed.

The problem I have is that the time to insert new nodes increases
significantly as I add more. There are ~4000 news items to import into
a year/month/item structure, and we import these in batches of 50.
Each batch takes longer and longer to import, starting at a few
seconds per batch of 50, but increasing to many minutes.

In contrast, using the database journal, news items can be imported in
batches that take (more or less) constant time, say a few seconds per
batch.

I expected performance wouldn't be as quite as good with the file
journal, but the difference in time is significant enough that I
suspect something may be wrong.

As far as I can see from FileRecordLog, Jackrabbit should only be
reading the headers from most of the journal files every time the
repository is synced. Using strace to monitor file accesses, I do see
patterns of 128 bytes being read from the start of journal files.

However, I also see regular and significant sized reads of the journal
log file, that I wouldn't expect to see if checking for new content. I
wonder if these are responsible for inserts getting so much slower?
Using a larger maximum file size for the journal seems to improve
things a lot, but it still appears to be doing to much work.

I've noticed that the revision.log file in the local workspace
sometimes has a plausible revision number, and sometimes has 8
null/zero bytes (denoting the long value, 0L), but we see something
similar with the database journal, so I'm not sure what to expect
there. This doesn't seem to happen on other machines in the cluster -
they all have sensible-looking (non-zero) revision numbers.

Is such a slow down expected for bulk inserts?

Are there any useful logs or diagnostics I could provide to diagnose the issue?

Thanks

Reply via email to