;-) Yes I did. But then I switched to the actual files I need to import and
they produce ~3.5M triples...

Using normal Jena 3.1 (i.e. no special context symbols set) the commit
after 100k triples works to import the file 10 times with the [B varying
between ~2Mb and ~4Mb. I'm currently testing a 20 instance pass.

A batched commit works for this bulk load because if it fails after a batch
commit I can remove the graph.

For my understanding... TDB is holding the triples/block/journal in heap
until commit is called? But this doesn't account for the [B not being
cleared after a commit of 3.5M triples. It takes another pass plus ~2M
uncommited triples before I get an OOME.

Digging around and there are some references made to the DirectByteBuffers
causing issues. IBM
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/excessive_native_memory_usage_by_directbytebuffers?lang=en
links the problem to;

Essentially the problem boils down to either:

   1. There are too many DBBs being allocated (or they are too large),
   and/or
   2. The DBBs are not being cleared up quickly enough.


and recommends using -XX:MaxDirectMemorySize=1024m to poke the GC via
System.gc(). Not sure if GC1C helps because of it's new heap model...

Would it be possible to get Jena to write it's uncommitted triples to disk
and then commit them to the TDB? Ok it's slower than RAM but until they are
committed only one thread has visibility anyway? Could direct that at a
different disk as well...

Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
before dropping back to 2Gb.

Dick.



On 27 July 2016 at 11:47, Andy Seaborne <a...@apache.org> wrote:

> On 27/07/16 11:22, Dick Murray wrote:
>
>> Hello.
>>
>> Something doesn't add up here... I've run repeated tests with the
>> following
>> MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.
>>
>> What I don't understand is the size of [B increases with each pass until
>> the OOME is thrown. The exact same process is run 5 times with a new graph
>> for each set of triples.
>>
>> There are ~3.5M triples added within the transaction from a file which is
>> a
>> "simple" text based file (30Mb) which is read in line pairs.
>>
>
> Err - you said 200k quads earlier!
>
> Set
>
> TransactionManager.QueueBatchSize=0 ;
>
> and break the load into small units for now and see if that helps.
>
> One experiment would be to write the output to disk and load from a
> program that only does the TDB part.
>
>     Andy
>
>
>
>> I've tested sequential loads of other text files (i.e. file x *5) and
>> other
>> text files loaded sequentally (i.e. file x, file y, file ...) and the same
>> result is exhibited.
>>
>> If I reduce -Xmx to 6g it will fail earlier.
>>
>> Changing the GC using -XX:+UseGC1C doesn't alter the outcome.
>>
>> I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
>> Centos 7 with Java 1.8.
>>
>> Any ideas?
>>
>> Regards Dick.
>>
>>
>
>

Reply via email to