Re: [Virtuoso-users] DBpedia-Live & Reification or Unique Triple IDs for adding metadata

Kingsley Idehen Mon, 01 Jun 2009 16:27:42 +0000

Jens Lehmann wrote:

Hello,
Sebastian Hellmann schrieb:
Hello,
[...]
1. Use of RDF Reification
It is a clean solution, as we could add even more metadata to triples,like which extractor they come from or a confidence value. The drawbackis that they basically need 4 extra triples + the metadata, which notonly raises the total triple count, but also the number of updates andqueries to keep updates consistent. (DBpedia could break the billiontriple border with this)
For those of you not familar with OWL 2 Axiom Annotations (similar toRDF Reification), let me give a short explanation:
Assume you have a triple $s $p $o. To make an annotation about thistriple/axiom, you need to add the following (in Turtle syntax):
$a rdf:type owl:Axiom;
    owl:subject $s;
    owl:predicate $p;
    owl:object $o
The purpose of this construct is that we now have an identifier $a forour triple. We can then annotate it, for instance:
$a extractedBy extractors:InfoboxExtractor;
    extractedFromTemplate templates:city;
    extractedOn "2009-10-25T04:00:00-05:00"^^xsd:dateTime .
    (maybe more meta information, e.g. confidence value, what led to the
    modification e.g. page change, template change)
An advantage of this approach is that we make the meta informationexplicit and conform to OWL 2 and RDF. It could be queried and (withouttoo much effort) also made available via the Linked Data interface. Itwould also allow us to create regular dumps from our live extraction.The annotations can be used by the DBpedia live extraction as Sebastianexplained. A disadvantage is that we need a lot more triples compared tothe current situation. Assuming a full extraction would currentlyrequire 300 million triples, storing additional annotations this waywould require 2.4 billion triples for DBpedia.The specific questions we have, are:
1.) Do you consider the increase in triple count problematic?

Since this is going to be V6 based, the size of DBpedia doesn't reallymatter. For instance, we have 4.5+ Billion (maybe 5+ now) on:http://lod.openlinksw.com. This is the kind of cluster setup we aregoing to use for DBpedia realtime once ready.

2.) How are SPARQL SELECT queries (not involving annotations) affected?Can we expect roughly the same performance (could be the case ifVirtuoso recognizes annotations), slightly worse performance, or muchworse performance?

I don't expect performance problems.

We we implement OWL2 inference enhancements it will get better. But evenright now I don't see the SPARQL performance as an issue.

3.) SPARUL: Sebastian mentioned that 6 million triples will need tochanged per day by the live extraction. Using annotations, this wouldrise by a factor of three (estimated). Can approx. 20 million tripleupdates per day be handled by the Virtuoso server(s) running DBpedia?

Since this is going to be load and deletes it shouldn't be too muchtrouble, but we should test and see what happens, and where issues arisewe can make

specific tweaks etc..

Of course, we cannot expect any precise answers here, but educatedguesses are very welcome. :-)

Sure.

Kingsley

Kind regards,

Jens



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: [Virtuoso-users] DBpedia-Live & Reification or Unique Triple IDs for adding metadata

Reply via email to