Re: arq:spillToDiskThreshold issue

Stephen Allen Fri, 18 Mar 2016 20:44:58 -0700

On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <[email protected]> wrote:


> On 18/03/16 09:16, Dominique Vandensteen wrote:
>
>> Hi,
>> I'm having problems handling "big" graphs (50M to 100M triples at current
>> stage) in my fuseki servers using sparql.
>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>> <...>".
>> Doing these action with these graphs I get OutOfMemory errors. Some
>> investigation pionted me to http://markmail.org/message/hjisrglx4eicrxyt
>> and
>>
>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3ccaj+mtwad1vfcnjaro37xkiwgyj7mrnillzvmsx1_nrj+rrf...@mail.gmail.com%3E
>>
>> Using this config:
>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>     ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>> ja:cxtValue "mapped" ] ;
>>     ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>> 10000 .
>> ] .
>> Solves my problem but brings up another problem. My temp folder gets
>> filled
>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>> These
>> files remain locked so I cannot delete them.
>> The files seem to be created
>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
>> reason not released.
>> Is there any way to work around this issue?
>>
>> I'm using
>> -fuseki 2.3.1
>> -jvm 1.8.0_25 64bit
>> -windows 10
>>
>
> mapped + Windows => files don't go away until the JVM exits [1] and even
> then it does not seem to be reliable according to some reports.
>
> I thought BufferAllocatorDirect was supposed to get round this but it
> allocates on direct memory (AKA malloc).
>
> It would need a spill to plain file implementation of BufferAllocator
> which we don't seem to have.
>
>         Andy
>
> [1]
> http://bugs.java.com/view_bug.do?bug_id=4724038
> and others.
>
>>
>>
You can use the off-JVM memory that Andy mentions by changing the "mapped"
to "direct" in your config file.  That is similar to using a memory mapped
file, except that you are limited by the amount of memory that you have
(but if you have enough virtual memory, then there should be no problem).

That first setting is only for TDB's storage of unwritten blocks.  But when
you do large updates, Jena needs to temporarily store all of the tuples
generated by the WHERE clause in memory before applying them in the update.
This is where the spillToDisk comes in, it serializes those temporary
tuples on disk in a regular file instead of holding them in an in-memory
array.  That file is not memory mapped, so there should be no problem with
removing it after the update is complete.

So basically, if "direct" works for you, then go with that (or use a
different OS like Linux for the memory mapped approach).

-Stephen

Re: arq:spillToDiskThreshold issue

Reply via email to