On Tue, May 15, 2018 at 10:09 AM, Martin Ellis <[email protected]> wrote:
> Hi,
>
> We have a Hippo CMS deployment, where Jackrabbit is configured using a
> database journal.
> I'm looking at switching to a file journal on NFS (EC2 EFS), and was
> referred here from the Hippo mailing list.
>
> I've got the file journal configuration working, and can import a
> small amount of content (say, 50 news articles) at a reasonable speed.
>
> The problem I have is that the time to insert new nodes increases
> significantly as I add more. There are ~4000 news items to import into
> a year/month/item structure, and we import these in batches of 50.
> Each batch takes longer and longer to import, starting at a few
> seconds per batch of 50, but increasing to many minutes.
>
> In contrast, using the database journal, news items can be imported in
> batches that take (more or less) constant time, say a few seconds per
> batch.
>
> I expected performance wouldn't be as quite as good with the file
> journal, but the difference in time is significant enough that I
> suspect something may be wrong.
>
> As far as I can see from FileRecordLog, Jackrabbit should only be
> reading the headers from most of the journal files every time the
> repository is synced. Using strace to monitor file accesses, I do see
> patterns of 128 bytes being read from the start of journal files.
>
> However, I also see regular and significant sized reads of the journal
> log file, that I wouldn't expect to see if checking for new content. I
> wonder if these are responsible for inserts getting so much slower?
> Using a larger maximum file size for the journal seems to improve
> things a lot, but it still appears to be doing to much work.

Perhaps the rotation [1], which uses java.io.File.renameTo(File) in
the end, takes longer than expected on the file system?
That's why it relieves a bit when you increased the maximum file size,
causing less frequent rotations?

[1] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/journal/FileJournal.java#L216-L219

>
> I've noticed that the revision.log file in the local workspace
> sometimes has a plausible revision number, and sometimes has 8
> null/zero bytes (denoting the long value, 0L), but we see something
> similar with the database journal, so I'm not sure what to expect
> there. This doesn't seem to happen on other machines in the cluster -
> they all have sensible-looking (non-zero) revision numbers.
>
> Is such a slow down expected for bulk inserts?
>
> Are there any useful logs or diagnostics I could provide to diagnose the 
> issue?

Unfortunately, there is no logging around the rotation of my current
suspicion. As FileJournal has never changed for years, you might
consider shadowing it with some logging in your project though.

Regards,

Woonsan

>
> Thanks

Reply via email to