Re: [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator

Philip Van Hoof Fri, 15 Jan 2010 03:29:28 -0800

On Fri, 2010-01-15 at 11:43 +0100, Florent Viard wrote:
> Hi,
> 
> We are working on using Tracker and faced some performance issues.
> The objective is to retrieve multiple informations about audio files.
> So we do a "big" Sparql query on a test base almost 10 000 files and
> we face a really slow reply causing dbus to timeout.


DBus timeouts are always for the client to handle. Using this API in
dbus-glib:

dbus_g_proxy_set_default_timeout


> First we thought it come from the a limit of the complexity of the
> Sparql query causing a "parser stack overflow" error that is described
> here:
> http://mail.gnome.org/archives/tracker-list/2009-December/msg00106.html
> But after simplifying the query we discover that the problem may come
> from the conversion of the Sparql to SQL query in
> tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala
> It seems that the SQL generated query is not totally efficient and may
> be improved.
> 
Okay, I'll leave this part for Jürg

> After further investigations, it appears that
> tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala is the
> source of the problem. There is an issue with the way the SQL query is
> generated.
> 

You can often use function variables instead of optionals. They are
faster.

For example

SELECT nie:title (?thing) { ?thing a nie:InformationElement) }

Perhaps try rewriting some of them to use that?

> Here is the Sparql query that we did:
> 
> "SELECT ?url ?title ?len ?creadate ?artnam ?sbpm ?genre ?bpm ?albtit ?ntrack 
> ?com ?lastaddmod WHERE " //?nbtack
>                 "{ ?x a nmm:MusicPiece . "
>                 "optional { ?x nie:isStoredAs ?url } ."
>                 "optional { ?x nie:title ?title } ."
>                 "optional { ?x nie:comment ?com } ."
>                 "optional { ?x nie:contentCreated ?creadate } ."
>                 "optional { ?x tracker:added ?lastaddmod } ."
>                 "optional { ?x nfo:genre ?genre } ."
>                 "optional { ?x nfo:averageBitrate ?sbpm} ."
>                 "optional { ?x nfo:sampleRate ?bpm } ."
>                 "optional { ?x nmm:length ?len } ."
>                 "optional { ?x nmm:performer ?artist . ?artist
> nmm:artistName ?artnam } ."
>                 "optional { ?x nmm:musicAlbum ?album . ?album
> nmm:albumTitle ?albtit } ."
>                 "optional { ?x nmm:trackNumber ?ntrack } ."
>         //       "optional { ?x
> nmm:albumTrackCount ?nbtrack} ."  //Removed because of the parser
> limited stack size
>                 " } LIMIT %d OFFSET %d";
> 
> And attached to this mail [ugly-tracker-sql-sparql.txt] is the
> automatically generated SQL (Reformatted to be human readable).
> Trying directly on SQLite this query with the 11k files dataset takes
> around 2 / 3 minutes to return a result.
> 
> In the second attached file [good-tracker-sql.txt] is the same SQL
> request, simplified by me, that is like tracker is supposed to
> generate the SQL from the Sparql query.
> That query executed with the same 11k files dataset on SQLite, gave a
> result in only 4 / 5 seconds. (And there, the complexity of the Sparql
> query could be increased).
> 
> I tried to modify myself the source code for this function, but it was
> too difficult for me because of lack of knowledge on the variables
> used by the Tracker's internal Sparql parser.
> 
> So I think you should modify the code in tracker-sparql-query.vala to
> generate a query like this by changing the creation of LEFT JOIN by
> the creation of a SELECT sub-query in the function
> translate_group_graph_pattern.
> 
> I know the common SQL "tips" on Internet that say that LEFT (and
> RIGHT) JOIN are more efficient than sub-queries. But here, always
> having ID as an index in all the tables could explain the difference.
> 
> A good idea could be to use SELECT sub-queries inside the main SELECT
> for the optionals of a same subject.
> 
> Regards,
> 


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Performance issue in Tracker 0.7.x due to the sparql to sql translator

Reply via email to