Re: [Tracker] Tracker Internal Documentaion

Philip Van Hoof Thu, 11 Apr 2013 05:29:38 -0700

On Thu, 2013-04-11 at 15:53 +0530, Vishesh Handa wrote:

> 
> I'm curious as to how you handle -
> 
> 
> 1. Type Inference - Say something like this 'select ?r where { ?r a
> nco:Contact . }. Lets say one has some 10 nco:Contacts and some 15
> nco:PersonContacts. In this case one would have to iterate over both
> the tables. Does tracker do that?


This gets translated to something like

SELECT Uri FROM "nco:Contact", to translate from nco:Contact to the SQL
table "nco:Contact" is something Tracker does internally. I think there
is a environment variable that you can turn on to print the SQLite
statements the first time they are prepared (there is a LRU cache of
SQLite statements). These statements include the query.

Type interference itself (select ?p ?o { nco:Contact ?p ?o }) works
because Tracker stores its own ontology in itself, and an ontology are
just a bunch of rdf statements.

> 2. Property Inference - Do you handle cases such as 'select ?r ?l
> where { ?r nao:prefLabel ?l . }', where the 'nao:prefLabel' has not
> been explicitly defined. Lets assume that the 'nie:title' has been
> set.

> The nie:title is a rdfs:subPropertyOf nao:prefLabel 

I don't remember.

> Does tracker handle cases like this? Cause this is a rather common
> usecase in Nepomuk where we want to fetch a good label for the
> resource and we do not want to query specific properties.

Afaik yes.

> 3. I read that you have some support for graphs - How is that
> implemented? From what I understand from your db schema each property
> has their own column, so I'm not sure where you would store the graph
> related info.

The graph support is limited. More explanation on its limitations here:
https://live.gnome.org/Tracker/Documentation/SparqlFeatures#Named_Graphs

> Also, does tracker use graphs for any purpose?

Only for storing the origin of a statement (which was the only required
use-case for the N9, which we were mainly targeting while developing
Tracker's SPARQL endpoint and Nepomuk ontology support).

> In the Nepomuk KDE world we generally use graphs to group triples
> based on which application has stored the information. This is
> especially useful in the case of indexing. When a file has been
> modified and needs to be re-indexed, we need to throw away the
> previous data and re-index it. The file could in this case have both
> indexed and non-indexed data such as tags and ratings. So, we only
> remove the statements that were added by the indexer and then reindex
> the file.

This isn't supported by Tracker.

> This seems like a very common use case. I'm curious as to how tracker
> solves this problem.

It doesn't. Full graph support was not a design consideration, only
limited support for it was.


Kind regards,

Philip

>         Then the libtrackersparql makes a WAL SQLite connection,
>         parses your
>         SPARQL and generates on the fly SQL for that. This happens
>         often using
>         subqueries, and without building an AST first - making the
>         parse-translate phase relatively fast and resource friendly,
>         which is a
>         design-choice as Tracker is indented to run on devices with
>         few
>         resources.
>         
>         This code does that:
>         
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala
>         
>         That design-choice of course has a draw back in that the
>         queries have to
>         manually optimized very often and/or to get things fast enough
>         we often
>         had to store data in a rendundant way. We did this with domain
>         specific
>         indexes which I explained in this blog item:
>         
>         
> http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
>         
> http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
>         
> http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes
>         
>         
>         This function is the code that translates a statement (from
>         the SPARQL
>         UPDATE we make RDF statements and then we process those) to
>         SQL inserts:
>         
>         
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732
>         
>         A lot of the SPARQL UPDATE part is here:
>         
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala
>         
>         Note that we have a buffer where we eliminate duplicates like:
>         
>         <a> a Class.
>         <a> Prop1 Value1.
>         <a> a Class.
>         <a> Prop2 Value2.
>         <a> Prop1 Value3.
>         
>         We translate that to:
>         
>         <a> a Class; Prop1 Value; Prop2 Value3.
>         
>         Except when we utilize the null support:
>         
>         
> http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
>         
> http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master
>         
>         The second blog item and my own comments in the first
>         illustrate the
>         problem with optimizing statement-sets like above just like
>         that.
>         
>         Then we also added a few SQLite functions which are used by
>         the
>         SPARQL->SQL translation and for most of our own SPARQL
>         extensions (we
>         ship with a bunch of SPARQL extensions that allow query
>         writers to make
>         queries faster):
>         
>         
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826
>         
>         The data-manager is mostly about creating the SQL tables and
>         preparation. It can also handle a limited amount of ontology
>         changes
>         (adding and removing of classes and properties and their
>         extensions like
>         domain specific indexes):
>         
> https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c
>         
> 
> 
> Thanks for all this information
> 
>  
> 
>         Kind regards,
>         
>         Philip
>         
>         
>         On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
>         > Hey Philip
>         >
>         > I'm one of the KDE Nepomuk developers. I've been looking
>         into the
>         > tracker project for some time now, since it's good to know
>         how other
>         > people internally implement things. However, it has been
>         very hard for
>         > me to find any documentation on the inner working of
>         tracker.
>         >
>         >
>         > I'm specifically interested in how the database schema is
>         designed. I
>         > did find this "Semantic Social Desktop and Mobile Devices"
>         > presentation [1] which gives a very rough overview of how
>         each class
>         > has its own table.
>         >
>         >
>         > Could you perhaps point me to some internal documentation?
>         It would be
>         > most helpful. Otherwise, could I ask your some detailed
>         questions
>         > about the tracker internals?
>         >
>         > I have looked at the source code, but it's a little hard to
>         understand
>         > for a new comer.
>         >
>         > [1]
>         https://live.gnome.org/Tracker/Documentation/Presentations
>         >
>         > --
>         > Vishesh Handa
>         >
>         
>         
>         
> 
> 
> 
> -- 
> Vishesh Handa
> 

-- 
Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Tracker Internal Documentaion

Reply via email to