Hi David,
after quite some work today I rewrote part of the Refactor Engine to
avoid creating useless graphs.
Many were blank ontologies created along with the SEO scope. They are no
longer created.
Many of the other graphs that you see are due to the fact that the
engine merges together the entity signatures into an OntoNet session.
Every such signature ends up resulting in its own ontology and therefore
a graph in Clerezza/TDB.
I have not modified this second behaviour, but I have seen to it that
the refactor engine now destroys its own session *and its contents* when
computeEnhancements() completes. This means a lot of space occupied
during analysis but freed up right thereafter.
It's more brutal than I wanted it to be, but a better implementation
will come up once I add a couple new features to OntoNet that should
make the process more reasonable.
On the upside, the engine code is now smaller by some 250 lines.
It would be super if you could update and try it out.
Thanks
Alessandro
P.S. now I'm glad I added the "ontonet" prefix to those graph names...
On 3/16/12 12:40 PM, David Riccitelli wrote:
From what I've seen so far, yes. But it could depend on your engine
configuration using a richer set of rules.
Same thing happens when we use the default rules set (seo_rules.sem) from
SVN.
We did not customize any other part of the installation with the exception
of loading a local DBpedia index in sling/datafiles.
David
On Fri, Mar 16, 2012 at 12:27 PM, Alessandro Adamou<[email protected]>wrote:
On 3/16/12 11:16 AM, David Riccitelli wrote:
Is this issue happening to us only?
From what I've seen so far, yes. But it could depend on your engine
configuration using a richer set of rules.
Alessandro
On Fri, Mar 16, 2012 at 12:12 PM, Alessandro Adamou<[email protected]>**
wrote:
One thing that it would be great to do is to detect the ontology ID
*before* creating the TripleCollection in Clerezza, so any mappings could
be done before storing.
But I don't know how this can be done with not so much code.
Perhaps creating an IndexedGraph, exploring its content, then creating
the
Graph in the TcManager with the same content and the right graph name,
then
finally clearing the IndexedGraph could work.
But it still means having twice the resource usage (disk+memory) for a
period.
Alessandro
On 3/16/12 10:56 AM, Alessandro Adamou wrote:
Hi David,
well, I guess that depends pretty much on how heavy the usage of OntoNet
is in your Stanbol installation.
Those are graphs created when OntoNet has to load an ontology from its
content rather than from a Web URI, so it cannot know the ontology ID
earlier.
This happens e.g. by POSTing the ontology as the payload or by passing a
GraphContentInputSource to the Java API.
Now I do not know why these graphs are created (perhaps the refactor
engine could be loading some), but I do know that a Clerezza graph in
Jena
TDB occupies a LOT of disk space.
Suffice it to say that my bundled had stored nine graphs of<100 triples
each. Their disk space was about 1.8 GB, but when I tried to make a
zipfile
out of it, it came out as about 2MB!
Alessandro
On 3/16/12 10:30 AM, David Riccitelli wrote:
Dears,
As I ran into disk issues, I found that this folder:
sling/felix/bundleXXX/data/****tdb-data/mgraph
where XX is the bundle of:
Clerezza - SCB Jena TDB Storage Provider
org.apache.clerezza.rdf.jena.****tdb.storage
took almost 70 gbytes of disk space (then the disk space has been
exhausted).
These are some of the files I found inside:
193M ./ontonet%3A%3Ainputstream%****3Aontology889
193M ./ontonet%3A%3Ainputstream%****3Aontology1041
193M ./ontonet%3A%3Ainputstream%****3Aontology395
193M ./ontonet%3A%3Ainputstream%****3Aontology363
193M ./ontonet%3A%3Ainputstream%****3Aontology661
193M ./ontonet%3A%3Ainputstream%****3Aontology786
193M ./ontonet%3A%3Ainputstream%****3Aontology608
193M ./ontonet%3A%3Ainputstream%****3Aontology213
193M ./ontonet%3A%3Ainputstream%****3Aontology188
193M ./ontonet%3A%3Ainputstream%****3Aontology602
Any clues?
Thanks,
David Riccitelli
****************************************************************
************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn:
http://it.linkedin.com/in/****riccitelli<http://it.linkedin.com/in/**riccitelli>
<http://it.linkedin.**com/in/riccitelli<http://it.linkedin.com/in/riccitelli>
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/****<http://www.layar.com/**>
publishing/developers/list/?****page=1&country=&city=&keyword=****
insideout10&lpn=1<http://www.**layar.com/publishing/**
developers/list/?page=1&**country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
****************************************************************
************************
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)
Not sent from my iSnobTechDevice
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)
Not sent from my iSnobTechDevice
--
M.Sc. Alessandro Adamou
Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy
"I will give you everything, so long as you do not demand anything."
(Ettore Petrolini, 1930)
Not sent from my iSnobTechDevice