Hi. Sorry for the delay. That worked, commits in batches of 10k take ~2 seconds 
now. This increases to ~5 second when the TDB is almost 1TB but it's 
predictable...
I'm looking at tdb2 in a test env...
Thank you.


Dick

-------- Original message --------
From: Andy Seaborne <a...@apache.org> 
Date: 09/08/2016  13:34  (GMT+00:00) 
To: users@jena.apache.org 
Subject: Re: Stall when committing a write transaction. 

On 08/08/16 12:33, Dick Murray wrote:
> Hello.
>
> Looking for ideas and if anyone else has come across this...
>
> I have a bulk load (same as the previous OOME question) which auto commits
> after 25k quads have been added then begins a new write transaction. All of
> the commits average 2 seconds but one takes 42 seconds. ~500K quads are
> added with ~500MB on disk storage. I've changed the underlying storage from
> HHD to SSD, to USB MS and I still get the same symptoms.
>
> Different files give different stalls, some have multiple stalls, typically
> around 40 seconds but some are 2 minutes. iotop is not showing anything
> "odd" and the GC isn't stressing. I can repeat this with a new TDB and a
> 25M quad TDB.
>
> Is the TDB having to copy write new "blocks" to balance it's storage at
> some point? Whilst it will stall at some point the point is not always the
> same.
>
> Jena 3.1, Ubuntu 16.04, 8 cores 16GB RAM, JVM Xmx 4GB G1GC.
>
> Log below shows consistent ~2 second commits bar one.
>
> TIA Dick.

Hi there,

The burstiness might be due to the commit batching though interactions 
with the OS file system is also possible.

Try setting
   TransactionManger.QueueBatchSize
to 0, 2, and a few other small integers (the default is 10).

If you could try that, it would be more data as to what is happening.

This is to amalgamate small commits - it would be better to factor in 
the size of commits but it doesn't (the size of the journal is easy to 
determime so a simple threshold there could work).


Have you had a moment to try TDB2?  It will behave differently here - 
the updates to the database happen as the transaction proceeds so they 
happen once and have OS-level write buffering going on, rather than 
happening exactly when told to.  And they only write once, not once to 
the journal and once in a random access pattern to the main DB which is 
also potentially nasty.

The only issue with TDB2 at the moment is that the database grows. It 
has all generations of the database available for all time.  It needs a 
process to reclaim old space (a GC problem) although access to temporal 
versions could be considered a advantage as well.

     Andy

Reply via email to