Re: TDB triple storage

Andy Seaborne Tue, 26 Jul 2016 04:43:01 -0700

On 26/07/16 12:08, Chao Wang wrote:

Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop
Still getting out of memory error after running for a while, Any suggestions?

A complete, minimal example. That is, something someone else can run,and just large enough to illustrate the issue.


Also details of which version of Jena, and which OS.

The reasoner setup is probably a factor.

        Andy





On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote:

On 25/07/16 21:14, Chao Wang wrote:

Hi Dave,
As you suggested, I have computed the closure in memory, totaling over 4 
millions triples. trying to serialize it.
Is there a direct API to serialize the whole model into TDB?
Tried to serialize into file, keep getting memory issue. What's the typical 
resource need for this size of model?


If you are getting problems as you write out the file, try using one of
the streaming formats.  The default format for RDF/XML or Turtle is
"pretty" and takes a significant amount of working space for analysis
before writing.

Some streaming output formats are:

Lang.NTRIPLES
RDFFormat.TURTLE_BLOCKS

https://jena.apache.org/documentation/io/rdf-output.html

Or does it fail during writing, after some output?

    Andy

________________________________________
From: Dave Reynolds [dave.e.reyno...@gmail.com]
Sent: Thursday, July 21, 2016 9:09 AM
To: users@jena.apache.org
Subject: Re: TDB triple storage

On 21/07/16 13:45, Chao Wang wrote:

Thanks Dave,
So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB 
with tdbloader, then starts up fuseki.
My question is when fuseki starts up, does it load all triples including 
inferred triples into memory?


Yes. It's actually slightly worse than that. All the inferences will be
in memory (including intermediate state) which will be bigger than than
source data. But the data itself isn't loaded explicitly which means
that the reasoner is going back to TDB for each query which is a further
slow down.

Using a lighter reasoner config (OWL Micro if you are not already using
it) may help.

Otherwise, if your data is stable, then as I say, compute the closure
once in memory, off line. Store that in TDB. Then have your fuseki
configuration use that precomputed closure with no runtime inference.

Dave

I am experiencing hanging sparql query. works fine with a small dataset. I am 
hoping reasoning is not done during query time...
________________________________________
From: Dave Reynolds [dave.e.reyno...@gmail.com]
Sent: Thursday, July 21, 2016 3:35 AM
To: users@jena.apache.org
Subject: Re: TDB triple storage

On 21/07/16 02:09, Chao Wang wrote:

A newbie question:
Does jena store the inferred triples into tdb? If yes, when?


No. The current reasoners operate in memory.

If you wish you can take the results of inference (either the entire
closure or the results of some selective queries) and store those back
in TDB yourself. A common pattern would be use separate named graphs for
the original data and for the inference closure and use union-default.
All this under your control but is not automatically done for you.

There is also some support for generating a partial RDFS inference
closure at the time you load TDB.

Dave

Re: TDB triple storage

Reply via email to