Re: [Tracker] tracker and RDF

Jamie McCracken Mon, 30 Oct 2006 16:11:36 -0800

Eyal Oren wrote:
> Hi Jamie,
> 
> I've just caught up with the huge number of emails on d-d-l. It was a 
> great discussion to follow, even though I feel you were being treated a 
> bit harshly.


I expected to get a much harsher reaction from some of the Novel and 
pro-beagle lads but apart from a few grumblings from them it passed off 
quite well. My sources tell me tracker was generaly well received 
amongst Gnome devs.

I loved the discussion on metadata specs and RDF schema,
> given that I am a semantic web researcher (see http://www.activerdf.org).

I see tracker as being a lot simpler, lighter and faster than a full 
blown sematic web thingy but of course anything we can learn from then 
would be useful so feel free to point things out :)

> 
> As you may remember, I tried to point to you earlier, that using RDF 
> (triples) as metadata format would be much more flexible than fixed 
> database tables, and using RDFS rules (such as subPropertyOf) would be 
> great to allow applications to deal with new data without them needing 
> to adjust their database queries).

yes it is more useful (but slower like that) so some tuning will be required

> Now seeing the course of the discussion, allow me to pitch in here. You 
> were discussing using librdf to do some RDF query answering. For your 
> information, librdf does not do any RDFS reasoning!!  librdf has one 
> single table called "triple(s,p,o)" in which it stores all triples, and 
> then does query rewriting from SPARQL or RDQL to this relational table.  
> However, the algorithms for query rewriting do not consider any RDFS 
> statements (e.g.  evolution:workEmail subPropertyOf tracker:email) so 
> librdf actually only provides "pure" RDF answers, without the benefits 
> of RDFS.

yeah the performance of that looks terrible as well as being pretty 
horrendous with the sql

> 
> As James Hendringe pointed out, the flexibility of RDF (we only have 
> triples) allow one to store arbitrary metadata, but querying with a 
> naive implementation (one single relational table) quite slow, since the 
> database is doing a self-join for each where clause.
> 
> I've recently worked on a simple RDF store based on sqlite3 (I call it 
> rdflite). The incentive came from the buggy and complex state of 
> existing RDF stores: I wanted something simple and lightweight that my 
> users can easily deploy and use (hence the choice for sqlite3). In the 
> course of building that, I've got quite some experience with query 
> rewriting from rdf-to-sql.

we already have an rdf query implementation so that jus needs a bit of 
tinkering to support automatic searching of all child metadata types and 
its not difficult

> 
> for the record: my datastore does now also not do any RDFS reasoning, 
> but that would not be too hard to implement: we need to adjust the query 
> rewriting to take some RDFS rules into account. In the evolution 
> example: if you ask for all emails, we need to rewrite the query, to 
> also consider (in a union) all those properties that have been defined 
> to be subproperties of email (in this case, work-emails).
> 
> Let me get to the point (sorry for the long story here): I'm very happy 
> that Wouter Bolsterlee and others showed you the advantage of RDFS and 
> RDF here, and I would be very very happy to explore it with you in the 
> context of tracker. If you want to roll your own rdf solution, I can 
> help you with my current rdf-in-sqlite experience; if you want to use 
> librdf I can help with my experience of using it and programming against 
> it; and if you want to brainstorm a bit about the (dis)advantages of 
> rdf, I'm happy to do that too.

I cant use librdf directly as I said on d-d-l its too abstracted and not 
optimised.

Currently we use two tables for metadata storage (one for the service 
and one for the metadata value - these are further split into 
string/numeric tables and indexes) so to support triple store I should 
just need one more table inserted between the existing two - this will 
allow for metadata to have any number of values at the cost of slower 
inserts and perhaps a bit more disk space

For the metadata types themselves we already have a table for storing 
them and we just need one more flattened table to store the 
relationships between the various types. THis should in effect gives us 
the sub property type stuff.

Its not difficult to do - just a lot of work plumbing it in to our 
existing framework and altering all our stored procedures and queries.

We dont need any rdf syntax/magic  here as its pretty simple stuff and 
we can expose a nice dbus api for managing the metadata types and their 
inter-relationships.

Our dbus api would also need to change so that "get metadata" returns an 
array instead of a single value as each entry can now have multiple 
metadata values etc. So unless there is anything I have missed its just 
elbow grease really :)

-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] tracker and RDF

Reply via email to