>  It needs a process to reclaim old space (a GC problem) although access to 
> temporal versions could be considered a advantage as well.

The single most important use case I have in hand would benefit enormously from 
this kind of persistence (although really on a transaction granularity). So at 
least some people out here would consider it an advantage! {grin}

---
A. Soroka
The University of Virginia Library

> On Aug 9, 2016, at 8:34 AM, Andy Seaborne <a...@apache.org> wrote:
> 
> On 08/08/16 12:33, Dick Murray wrote:
>> Hello.
>> 
>> Looking for ideas and if anyone else has come across this...
>> 
>> I have a bulk load (same as the previous OOME question) which auto commits
>> after 25k quads have been added then begins a new write transaction. All of
>> the commits average 2 seconds but one takes 42 seconds. ~500K quads are
>> added with ~500MB on disk storage. I've changed the underlying storage from
>> HHD to SSD, to USB MS and I still get the same symptoms.
>> 
>> Different files give different stalls, some have multiple stalls, typically
>> around 40 seconds but some are 2 minutes. iotop is not showing anything
>> "odd" and the GC isn't stressing. I can repeat this with a new TDB and a
>> 25M quad TDB.
>> 
>> Is the TDB having to copy write new "blocks" to balance it's storage at
>> some point? Whilst it will stall at some point the point is not always the
>> same.
>> 
>> Jena 3.1, Ubuntu 16.04, 8 cores 16GB RAM, JVM Xmx 4GB G1GC.
>> 
>> Log below shows consistent ~2 second commits bar one.
>> 
>> TIA Dick.
> 
> Hi there,
> 
> The burstiness might be due to the commit batching though interactions with 
> the OS file system is also possible.
> 
> Try setting
>  TransactionManger.QueueBatchSize
> to 0, 2, and a few other small integers (the default is 10).
> 
> If you could try that, it would be more data as to what is happening.
> 
> This is to amalgamate small commits - it would be better to factor in the 
> size of commits but it doesn't (the size of the journal is easy to determime 
> so a simple threshold there could work).
> 
> 
> Have you had a moment to try TDB2?  It will behave differently here - the 
> updates to the database happen as the transaction proceeds so they happen 
> once and have OS-level write buffering going on, rather than happening 
> exactly when told to.  And they only write once, not once to the journal and 
> once in a random access pattern to the main DB which is also potentially 
> nasty.
> 
> The only issue with TDB2 at the moment is that the database grows. It has all 
> generations of the database available for all time.  It needs a process to 
> reclaim old space (a GC problem) although access to temporal versions could 
> be considered a advantage as well.
> 
>    Andy
> 

Reply via email to