Re: clerezza.rdf.jena.tdb.storage filling up with ontonet files

Alessandro Adamou Fri, 16 Mar 2012 10:39:06 -0700

Hi David,

after quite some work today I rewrote part of the Refactor Engine toavoid creating useless graphs.

Many were blank ontologies created along with the SEO scope. They are nolonger created.

Many of the other graphs that you see are due to the fact that theengine merges together the entity signatures into an OntoNet session.Every such signature ends up resulting in its own ontology and thereforea graph in Clerezza/TDB.

I have not modified this second behaviour, but I have seen to it thatthe refactor engine now destroys its own session *and its contents* whencomputeEnhancements() completes. This means a lot of space occupiedduring analysis but freed up right thereafter.

It's more brutal than I wanted it to be, but a better implementationwill come up once I add a couple new features to OntoNet that shouldmake the process more reasonable.


On the upside, the engine code is now smaller by some 250 lines.

It would be super if you could update and try it out.

Thanks

Alessandro

P.S. now I'm glad I added the "ontonet" prefix to those graph names...


On 3/16/12 12:40 PM, David Riccitelli wrote:

 From what I've seen so far, yes. But it could depend on your engine
configuration using a richer set of rules.


Same thing happens when we use the default rules set (seo_rules.sem) from
SVN.

We did not customize any other part of the installation with the exception
of loading a local DBpedia index in sling/datafiles.

David

On Fri, Mar 16, 2012 at 12:27 PM, Alessandro Adamou<[email protected]>wrote:

On 3/16/12 11:16 AM, David Riccitelli wrote:

Is this issue happening to us only?

 From what I've seen so far, yes. But it could depend on your engine
configuration using a richer set of rules.

Alessandro

  On Fri, Mar 16, 2012 at 12:12 PM, Alessandro Adamou<[email protected]>**

wrote:

  One thing that it would be great to do is to detect the ontology ID

*before* creating the TripleCollection in Clerezza, so any mappings could
be done before storing.

But I don't know how this can be done with not so much code.

Perhaps creating an IndexedGraph, exploring its content, then creating
the
Graph in the TcManager with the same content and the right graph name,
then
finally clearing the IndexedGraph could work.

But it still means having twice the resource usage (disk+memory) for a
period.

Alessandro



On 3/16/12 10:56 AM, Alessandro Adamou wrote:

  Hi David,

well, I guess that depends pretty much on how heavy the usage of OntoNet
is in your Stanbol installation.

Those are graphs created when OntoNet has to load an ontology from its
content rather than from a Web URI, so it cannot know the ontology ID
earlier.

This happens e.g. by POSTing the ontology as the payload or by passing a
GraphContentInputSource to the Java API.

Now I do not know why these graphs are created (perhaps the refactor
engine could be loading some), but I do know that a Clerezza graph in
Jena
TDB occupies a LOT of disk space.

Suffice it to say that my bundled had stored nine graphs of<100 triples
each. Their disk space was about 1.8 GB, but when I tried to make a
zipfile
out of it, it came out as about 2MB!

Alessandro


On 3/16/12 10:30 AM, David Riccitelli wrote:

  Dears,

As I ran into disk issues, I found that this folder:
  sling/felix/bundleXXX/data/****tdb-data/mgraph


where XX is the bundle of:
  Clerezza - SCB Jena TDB Storage Provider
org.apache.clerezza.rdf.jena.****tdb.storage


took almost 70 gbytes of disk space (then the disk space has been
exhausted).

These are some of the files I found inside:
193M ./ontonet%3A%3Ainputstream%****3Aontology889
193M ./ontonet%3A%3Ainputstream%****3Aontology1041
193M ./ontonet%3A%3Ainputstream%****3Aontology395
193M ./ontonet%3A%3Ainputstream%****3Aontology363
193M ./ontonet%3A%3Ainputstream%****3Aontology661
193M ./ontonet%3A%3Ainputstream%****3Aontology786
193M ./ontonet%3A%3Ainputstream%****3Aontology608
193M ./ontonet%3A%3Ainputstream%****3Aontology213
193M ./ontonet%3A%3Ainputstream%****3Aontology188
193M ./ontonet%3A%3Ainputstream%****3Aontology602


Any clues?

Thanks,
David Riccitelli

****************************************************************
************************


InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: 
http://it.linkedin.com/in/****riccitelli<http://it.linkedin.com/in/**riccitelli>
<http://it.linkedin.**com/in/riccitelli<http://it.linkedin.com/in/riccitelli>
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/****<http://www.layar.com/**>
publishing/developers/list/?****page=1&country=&city=&keyword=****
insideout10&lpn=1<http://www.**layar.com/publishing/**
developers/list/?page=1&**country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
****************************************************************
************************

--

M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)

Not sent from my iSnobTechDevice

--
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)

Not sent from my iSnobTechDevice



--
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)

Not sent from my iSnobTechDevice

Re: clerezza.rdf.jena.tdb.storage filling up with ontonet files

Reply via email to