On 13 July 2013 09:29, Philip Van Hoof <phi...@codeminded.be> wrote: > Ivan Frade schreef op 12/07/2013 18:52: > > Hi guys, > > I plan to write a more detailed guide to improving insert performance when I > have more time. This weekend I'm very busy with moving from my gf's > appartment to my newly renovated house ;), so i'll keep it short.
I hope the move goes well! That guide you're mentioning sounds very interesting indeed. I'll keep an eye out for it! > > Important is to use the INSERT OR REPLACE feature instead of DELETE+INSERT, > another thing you can do is increase the LRU cache size and tweak the > various buffer sizes we have in tracker-data-update.c. Interesting, I will have a look at this. > > Finally changing the ontology could help. But because of the decomposed > schema you wouln't touch tables of ontology domains that aren't related to > your insert of data a lot. > > Except indeed when there are hierarchies. So if a total ontology rewrite is > fine, try to reduce inheritance. Aggregation over inheritance ('has a' > instead of 'is a') in the ontology will often be faster, but it also depends > on a variety of things for which you should study the insert queries that we > generate for a given insert v. ontology situation. Aggregation will often > make your SELECT queries more complicated (and if you need the data, > probably slower too). We optimized first for read speed, then write speed. It would be very unfortunate to kill the reading speed in favor of writing. I will have a good look at re-working some ontologies however, and see how speeds are affected overall, perhaps it is possible to find a good balance. If I come up with anything interesting I will of course let you guys know. > > The inserting, updating and deleting on the SQL layer itself is by the way > not the only thing that influences insert performance. The SPARQL parsing, > buffering and grouping into transations among other things (like IPC > overhead) also play a role. Although I must say that after so many years of > being plagued by Nokians who didn't like Tracker because it was Not Invented > Here (not by their own team) and somewhat enforced upon them, we did ensure > that it's really really optimized (and teams where challenged to find > performance improvements and open bugs on them, instead of making empty > arguments that it's not). It would surprise me if you'd find a single strdup > or malloc that shouldn't be there, for example. But I'll be more than happy > if you eliminate one. > > Next you have indexes and domain specific indexes that'll slow down > inserting. And you have the signals on changes that you can turn off on a > class, which will have a memory usage and performance impact while inserting > (not doing something is always faster than doing something). Also interesting. I assume these indexes are crucial for good lookup speeds however? I will definitely have a look at them and again try to find a good balance here. > > If you don't need FTS, then disabling FTS should make a huge performance > improvement. For the same reason (a lot of things wont be done anymore, > which is always faster than doing them. But FTS is also a nice feature to > have. So make your choice). > Good point! I remember this being mentioned earlier on the list. It might be possible to work around FTS not being available if performance is motivating enough. Thanks for all the tips! -- Regards, Jonatan Pålsson Pelagicore AB Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list