On Thu, Apr 11, 2013 at 12:20 PM, Philip Van Hoof <phi...@codeminded.be>wrote:

> Hi Vishesh,
> It's always a good idea to pose these questions on the Tracker public
> mailing list, so I replied with the mailing list in CC:
> So you have a denormalized schema in SQL where each multi value field in
> Nepomuk is represented by a table, and each RDF class is represented by
> a table with the exception of some of the xsd primitive ones (which are
> implied because SQLite knows how to handle these things, for example
> xsd:int, xsd:string, etc).

I'm curious as to how you handle -

1. Type Inference - Say something like this 'select ?r where { ?r a
nco:Contact . }. Lets say one has some 10 nco:Contacts and some 15
nco:PersonContacts. In this case one would have to iterate over both the
tables. Does tracker do that?

2. Property Inference - Do you handle cases such as 'select ?r ?l where {
?r nao:prefLabel ?l . }', where the 'nao:prefLabel' has not been explicitly
defined. Lets assume that the 'nie:title' has been set.

The nie:title is a rdfs:subPropertyOf nao:prefLabel

Does tracker handle cases like this? Cause this is a rather common usecase
in Nepomuk where we want to fetch a good label for the resource and we do
not want to query specific properties.

3. I read that you have some support for graphs - How is that implemented?
>From what I understand from your db schema each property has their own
column, so I'm not sure where you would store the graph related info.

Also, does tracker use graphs for any purpose?

In the Nepomuk KDE world we generally use graphs to group triples based on
which application has stored the information. This is especially useful in
the case of indexing. When a file has been modified and needs to be
re-indexed, we need to throw away the previous data and re-index it. The
file could in this case have both indexed and non-indexed data such as tags
and ratings. So, we only remove the statements that were added by the
indexer and then reindex the file.

This seems like a very common use case. I'm curious as to how tracker
solves this problem.

> Then the libtrackersparql makes a WAL SQLite connection, parses your
> SPARQL and generates on the fly SQL for that. This happens often using
> subqueries, and without building an AST first - making the
> parse-translate phase relatively fast and resource friendly, which is a
> design-choice as Tracker is indented to run on devices with few
> resources.
> This code does that:
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala
> That design-choice of course has a draw back in that the queries have to
> manually optimized very often and/or to get things fast enough we often
> had to store data in a rendundant way. We did this with domain specific
> indexes which I explained in this blog item:
> http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
> http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
> http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes
> This function is the code that translates a statement (from the SPARQL
> UPDATE we make RDF statements and then we process those) to SQL inserts:
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732
> A lot of the SPARQL UPDATE part is here:
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala
> Note that we have a buffer where we eliminate duplicates like:
> <a> a Class.
> <a> Prop1 Value1.
> <a> a Class.
> <a> Prop2 Value2.
> <a> Prop1 Value3.
> We translate that to:
> <a> a Class; Prop1 Value; Prop2 Value3.
> Except when we utilize the null support:
> http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
> http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master
> The second blog item and my own comments in the first illustrate the
> problem with optimizing statement-sets like above just like that.
> Then we also added a few SQLite functions which are used by the
> SPARQL->SQL translation and for most of our own SPARQL extensions (we
> ship with a bunch of SPARQL extensions that allow query writers to make
> queries faster):
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826
> The data-manager is mostly about creating the SQL tables and
> preparation. It can also handle a limited amount of ontology changes
> (adding and removing of classes and properties and their extensions like
> domain specific indexes):
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c
Thanks for all this information

> On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
> > Hey Philip
> >
> > I'm one of the KDE Nepomuk developers. I've been looking into the
> > tracker project for some time now, since it's good to know how other
> > people internally implement things. However, it has been very hard for
> > me to find any documentation on the inner working of tracker.
> >
> >
> > I'm specifically interested in how the database schema is designed. I
> > did find this "Semantic Social Desktop and Mobile Devices"
> > presentation [1] which gives a very rough overview of how each class
> > has its own table.
> >
> >
> > Could you perhaps point me to some internal documentation? It would be
> > most helpful. Otherwise, could I ask your some detailed questions
> > about the tracker internals?
> >
> > I have looked at the source code, but it's a little hard to understand
> > for a new comer.
> >
> > [1] https://live.gnome.org/Tracker/Documentation/Presentations
> >
Vishesh Handa
