Ok - so it was the reduced heap size.

Good thing TDB uses out-of-heap space for indexes then! This space (mapped files) flexes under OS control. It's the filing system cache mapped into the user process.

The TDB union graph takes zero space - it's calculated.

What you could do is one transaction per HTTP request, assuming a HTTP request is not long running. If you can identify what is a read request and what is a write request,

Singe triple transactions (c.f. autocommit) is generally a bad idea for use above small. The overhead costs mount up - fully compatible with SQL here!

(TDB2 - any add/delete outside a transaction is a truely autocommited transaction)

Thanks for the update, sharing experiences really is a great help,

        Andy



On 07/02/16 09:16, Jean-Marc Vanel wrote:
I applied Andy's hints:
- TransactionManager.DEBUG = true
- TransactionManager.QueueBatchSize = 5

Before each Transaction, it prints the size of the List of Transaction's of
in commitedAwaitingFlush. I also printed  the size of file TDB/journal.jrnl
. So I see both sizes grow until reaching QueueBatchSize and then both
sizes returns to 0 .

What happened in my case was that the transactions were rather big, several
Mb. And the memory set for the application was rather small, 200Mb.
So, with the default of QueueBatchSize = 10, depending on the inputs ( in
the case causing OOM exception inputs are dbPedia URL's ), the available
memory was sooner or later  exceeded. This was certainly made worse by
having one or two instances of the union graph around.

I did not set QueueBatchSize to 0 for now, because this list of commits
Awaiting Flush optimizes disk usage, but maybe I should set it to the
maximum number of transactions during an HTTP request. Note that I have
also transactions as small as a single triple (corresponding to user
input).

The instrumented TransactionManager.java is here:
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/app/controllers/TransactionManager.java

Thanks all for you help !


2016-02-04 12:50 GMT+01:00 Rob Vesse <rve...@dotnetrdf.org>:


On 04/02/2016 10:41, "Jean-Marc Vanel" <jeanmarc.va...@gmail.com> wrote:

The journal workspace you mention is on disk , isn't it ? My problem is
not
on disk at all.

No

The journal is both in-memory and on-disk as it is a Write ahead log for
failure recovery purposes, the disk portion preserves the data for failure
recovery while the in-memory portion provides data to the live instance.

If there is a non-empty journal on disk then there is a corresponding
amount of memory within the JVM heap used to store the latest state of the
data for subsequent transactions while not overwriting the old state of
the data which ongoing transactions may still be accessing

Rob








Reply via email to