On 13 July 2013 09:29, Philip Van Hoof <phi...@codeminded.be> wrote:
> Ivan Frade schreef op 12/07/2013 18:52:
> Hi guys,
> I plan to write a more detailed guide to improving insert performance when I
> have more time. This weekend I'm very busy with moving from my gf's
> appartment to my newly renovated house ;), so i'll keep it short.

I hope the move goes well! That guide you're mentioning sounds very
interesting indeed. I'll keep an eye out for it!

> Important is to use the INSERT OR REPLACE feature instead of DELETE+INSERT,
> another thing you can do is increase the LRU cache size and tweak the
> various buffer sizes we have in tracker-data-update.c.

Interesting, I will have a look at this.

> Finally changing the ontology could help. But because of the decomposed
> schema you wouln't touch tables of ontology domains that aren't related to
> your insert of data a lot.
> Except indeed when there are hierarchies. So if a total ontology rewrite is
> fine, try to reduce inheritance. Aggregation over inheritance ('has a'
> instead of 'is a') in the ontology will often be faster, but it also depends
> on a variety of things for which you should study the insert queries that we
> generate for a given insert v. ontology situation. Aggregation will often
> make your SELECT queries more complicated (and if you need the data,
> probably slower too). We optimized first for read speed, then write speed.

It would be very unfortunate to kill the reading speed in favor of
writing. I will have a good look at re-working some ontologies
however, and see how speeds are affected overall, perhaps it is
possible to find a good balance. If I come up with anything
interesting I will of course let you guys know.

> The inserting, updating and deleting on the SQL layer itself is by the way
> not the only thing that influences insert performance. The SPARQL parsing,
> buffering and grouping into transations among other things (like IPC
> overhead) also play a role. Although I must say that after so many years of
> being plagued by Nokians who didn't like Tracker because it was Not Invented
> Here (not by their own team) and somewhat enforced upon them, we did ensure
> that it's really really optimized (and teams where challenged to find
> performance improvements and open bugs on them, instead of making empty
> arguments that it's not). It would surprise me if you'd find a single strdup
> or malloc that shouldn't be there, for example. But I'll be more than happy
> if you eliminate one.
> Next you have indexes and domain specific indexes that'll slow down
> inserting. And you have the signals on changes that you can turn off on a
> class, which will have a memory usage and performance impact while inserting
> (not doing something is always faster than doing something).

Also interesting. I assume these indexes are crucial for good lookup
speeds however? I will definitely have a look at them and again try to
find a good balance here.

> If you don't need FTS, then disabling FTS should make a huge performance
> improvement. For the same reason (a lot of things wont be done anymore,
> which is always faster than doing them. But FTS is also a nice feature to
> have. So make your choice).
Good point! I remember this being mentioned earlier on the list. It
might be possible to work around FTS not being available if
performance is motivating enough. Thanks for all the tips!

Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden
tracker-list mailing list

Reply via email to