> It needs a process to reclaim old space (a GC problem) although access to > temporal versions could be considered a advantage as well.
The single most important use case I have in hand would benefit enormously from this kind of persistence (although really on a transaction granularity). So at least some people out here would consider it an advantage! {grin} --- A. Soroka The University of Virginia Library > On Aug 9, 2016, at 8:34 AM, Andy Seaborne <a...@apache.org> wrote: > > On 08/08/16 12:33, Dick Murray wrote: >> Hello. >> >> Looking for ideas and if anyone else has come across this... >> >> I have a bulk load (same as the previous OOME question) which auto commits >> after 25k quads have been added then begins a new write transaction. All of >> the commits average 2 seconds but one takes 42 seconds. ~500K quads are >> added with ~500MB on disk storage. I've changed the underlying storage from >> HHD to SSD, to USB MS and I still get the same symptoms. >> >> Different files give different stalls, some have multiple stalls, typically >> around 40 seconds but some are 2 minutes. iotop is not showing anything >> "odd" and the GC isn't stressing. I can repeat this with a new TDB and a >> 25M quad TDB. >> >> Is the TDB having to copy write new "blocks" to balance it's storage at >> some point? Whilst it will stall at some point the point is not always the >> same. >> >> Jena 3.1, Ubuntu 16.04, 8 cores 16GB RAM, JVM Xmx 4GB G1GC. >> >> Log below shows consistent ~2 second commits bar one. >> >> TIA Dick. > > Hi there, > > The burstiness might be due to the commit batching though interactions with > the OS file system is also possible. > > Try setting > TransactionManger.QueueBatchSize > to 0, 2, and a few other small integers (the default is 10). > > If you could try that, it would be more data as to what is happening. > > This is to amalgamate small commits - it would be better to factor in the > size of commits but it doesn't (the size of the journal is easy to determime > so a simple threshold there could work). > > > Have you had a moment to try TDB2? It will behave differently here - the > updates to the database happen as the transaction proceeds so they happen > once and have OS-level write buffering going on, rather than happening > exactly when told to. And they only write once, not once to the journal and > once in a random access pattern to the main DB which is also potentially > nasty. > > The only issue with TDB2 at the moment is that the database grows. It has all > generations of the database available for all time. It needs a process to > reclaim old space (a GC problem) although access to temporal versions could > be considered a advantage as well. > > Andy >