Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Sebastian Schaffert Tue, 26 Jul 2011 12:19:05 -0700

Danke für die Unterstützung, da würd ich auch gern hin. ;-)

Aber viele Vorschläge sind schon sehr gut, ich würd wirklich gerne weg von 
Hibernate wenn es eine saubere Möglichkeit gibt ...


lg
Sebastian

Am 26.07.2011 um 18:59 schrieb Rupert Westenthaler:

> Hi
> 
> I think we should investigate if it would make sense to implement the
> Clerezza APIs on top of the "Kiwi" Triple store. This would allow any
> Clerezza based Application - including stanbol - to use this Triple
> store implementation.
> 
> WDYT
> Rupert
> 
> On Tue, Jul 26, 2011 at 5:54 PM, Sebastian Schaffert
> <[email protected]> wrote:
>> Dear Florent,
>> 
>> Am 26.07.2011 um 16:46 schrieb florent andré:
>>> 
>>>> 
>>>> The dependency to Hibernate is mostly for the triple store, not for CMS 
>>>> capabilities. And this is something I don't see how to avoid in the near 
>>>> future because we need to store additional information about triples for 
>>>> reasoning and versioning.
>>>> 
>>>> Versioning is also of triples, not of content. As such it is probably also 
>>>> interesting to the Stanbol community.
>>> 
>>> I'm interesting in a little explanation of the way you store version / 
>>> history of triples.
>> 
>> We use a purely relational approach actually:
>> - a table "KIWINODE" stores RDF nodes (unified table for literals, blank 
>> nodes and resources)
>> - a table "TRIPLES" stores triples with id, subject, predicate, object, 
>> context, marker for deleted, marker for inferred, timestamp, creator 
>> (subject, predicate, object, context, creator are references to KIWINODE)
>> - a table "VERSION" stores version ID, timestamp, creator
>> - join tables "VERSION_ADDEDNODES", "VERSION_REMOVEDNODES", 
>> "VERSION_ADDEDTRIPLES", "VERSION_REMOVEDTRIPLES" store references to added 
>> and removed nodes and to added and removed triples; for deleted triples and 
>> nodes, the boolean marker will be set to true, for added nodes it will be 
>> false
>> 
>> Versioning is thus a simple database operation. "Active" (undeleted) triples 
>> can be easily filtered using the boolean marker. Undoing simply means 
>> reversing the operations (add and remove) on triples and nodes.
>> 
>> 
>>> 
>>> I begin to think about that (but just think for now :) ), and the possible 
>>> help of big tables (e.g. hbase) for this...
>>> 
>>> Hbase is a (kind of) 3 dimensional database :
>>> - 1 is column
>>> - 1 is row
>>> - 1 is timestamp
>> 
> I think there is currently a lot of work on how to handle Graph
> Structures in this kind of data stores. I am definitely interested in
> this topic but currently I do not have the time to investigate it in
> more detail.
> 
>> I really don't see the point. A relational database is already n-dimensional 
>> ;-)
>> 
> 
> As long as you can handle the amount of triples on a single machine it
> is fore sure more efficient and easier to implement to handle it with
> a relational database.
> I think there is also a new TripleStore implementation around that
> uses Solr/Lucene to store Triples. Someone has mentioned it in Paris,
> but I have forgot the name of the project.
> 
>> 
>>> 
>>> So, for my 100 feet idea :
>>> - each triple is a row
>>> - ?s, ?p, ?o each a column (or a column family)
>>> 
>>> And so, history of each triple is store on the 3rd dimension : timestamps.
>>> 
>>> This can bring to a really clean and easy design... if not strong 
>>> technical/integration restrictions comes...
>> 
>> I am not really convinced, but maybe you can offer some more details and 
>> convince me.;-) I am not familiar with these kinds of databases.
>> 
>> My thought is that relational databases are really well suited for the task 
>> because this is what they have been designed for (triples are really purely 
>> relational data), with one (minor) exception: expensive join operations 
>> happen frequently when querying RDF, and there is almost no chance to 
>> materialize them in advance. This can be compensated a bit by proper 
>> indexing and configuration of the database, however.
>> 
> 
> Yago2 uses a special n-triple model that includes subject, predicate,
> object, temporal, spatial and full text. For spatial and full text
> they use the according extensions of the relational databases. By that
> they can creatly reduce the amount of joins for requests for event
> like data.
> 
> Again this discussion is very related to the work of Fabian on the Factstore!
> 
> best
> Rupert
> 
> -- 
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Reply via email to