On Fri, 2010-01-15 at 11:43 +0100, Florent Viard wrote: > Hi, > > We are working on using Tracker and faced some performance issues. > The objective is to retrieve multiple informations about audio files. > So we do a "big" Sparql query on a test base almost 10 000 files and > we face a really slow reply causing dbus to timeout.
DBus timeouts are always for the client to handle. Using this API in dbus-glib: dbus_g_proxy_set_default_timeout > First we thought it come from the a limit of the complexity of the > Sparql query causing a "parser stack overflow" error that is described > here: > http://mail.gnome.org/archives/tracker-list/2009-December/msg00106.html > But after simplifying the query we discover that the problem may come > from the conversion of the Sparql to SQL query in > tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala > It seems that the SQL generated query is not totally efficient and may > be improved. > Okay, I'll leave this part for Jürg > After further investigations, it appears that > tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala is the > source of the problem. There is an issue with the way the SQL query is > generated. > You can often use function variables instead of optionals. They are faster. For example SELECT nie:title (?thing) { ?thing a nie:InformationElement) } Perhaps try rewriting some of them to use that? > Here is the Sparql query that we did: > > "SELECT ?url ?title ?len ?creadate ?artnam ?sbpm ?genre ?bpm ?albtit ?ntrack > ?com ?lastaddmod WHERE " //?nbtack > "{ ?x a nmm:MusicPiece . " > "optional { ?x nie:isStoredAs ?url } ." > "optional { ?x nie:title ?title } ." > "optional { ?x nie:comment ?com } ." > "optional { ?x nie:contentCreated ?creadate } ." > "optional { ?x tracker:added ?lastaddmod } ." > "optional { ?x nfo:genre ?genre } ." > "optional { ?x nfo:averageBitrate ?sbpm} ." > "optional { ?x nfo:sampleRate ?bpm } ." > "optional { ?x nmm:length ?len } ." > "optional { ?x nmm:performer ?artist . ?artist > nmm:artistName ?artnam } ." > "optional { ?x nmm:musicAlbum ?album . ?album > nmm:albumTitle ?albtit } ." > "optional { ?x nmm:trackNumber ?ntrack } ." > // "optional { ?x > nmm:albumTrackCount ?nbtrack} ." //Removed because of the parser > limited stack size > " } LIMIT %d OFFSET %d"; > > And attached to this mail [ugly-tracker-sql-sparql.txt] is the > automatically generated SQL (Reformatted to be human readable). > Trying directly on SQLite this query with the 11k files dataset takes > around 2 / 3 minutes to return a result. > > In the second attached file [good-tracker-sql.txt] is the same SQL > request, simplified by me, that is like tracker is supposed to > generate the SQL from the Sparql query. > That query executed with the same 11k files dataset on SQLite, gave a > result in only 4 / 5 seconds. (And there, the complexity of the Sparql > query could be increased). > > I tried to modify myself the source code for this function, but it was > too difficult for me because of lack of knowledge on the variables > used by the Tracker's internal Sparql parser. > > So I think you should modify the code in tracker-sparql-query.vala to > generate a query like this by changing the creation of LEFT JOIN by > the creation of a SELECT sub-query in the function > translate_group_graph_pattern. > > I know the common SQL "tips" on Internet that say that LEFT (and > RIGHT) JOIN are more efficient than sub-queries. But here, always > having ID as an index in all the tables could explain the difference. > > A good idea could be to use SELECT sub-queries inside the main SELECT > for the optionals of a same subject. > > Regards, > -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list