On 13 July 2013 09:29, Philip Van Hoof <phi...@codeminded.be> wrote:
> Ivan Frade schreef op 12/07/2013 18:52:
>
> Hi guys,
>
> I plan to write a more detailed guide to improving insert performance when I
> have more time. This weekend I'm very busy with moving from my gf's
> appartment to my newly renovated house ;), so i'll keep it short.

I hope the move goes well! That guide you're mentioning sounds very
interesting indeed. I'll keep an eye out for it!

>
> Important is to use the INSERT OR REPLACE feature instead of DELETE+INSERT,
> another thing you can do is increase the LRU cache size and tweak the
> various buffer sizes we have in tracker-data-update.c.

Interesting, I will have a look at this.

>
> Finally changing the ontology could help. But because of the decomposed
> schema you wouln't touch tables of ontology domains that aren't related to
> your insert of data a lot.
>
> Except indeed when there are hierarchies. So if a total ontology rewrite is
> fine, try to reduce inheritance. Aggregation over inheritance ('has a'
> instead of 'is a') in the ontology will often be faster, but it also depends
> on a variety of things for which you should study the insert queries that we
> generate for a given insert v. ontology situation. Aggregation will often
> make your SELECT queries more complicated (and if you need the data,
> probably slower too). We optimized first for read speed, then write speed.

It would be very unfortunate to kill the reading speed in favor of
writing. I will have a good look at re-working some ontologies
however, and see how speeds are affected overall, perhaps it is
possible to find a good balance. If I come up with anything
interesting I will of course let you guys know.

>
> The inserting, updating and deleting on the SQL layer itself is by the way
> not the only thing that influences insert performance. The SPARQL parsing,
> buffering and grouping into transations among other things (like IPC
> overhead) also play a role. Although I must say that after so many years of
> being plagued by Nokians who didn't like Tracker because it was Not Invented
> Here (not by their own team) and somewhat enforced upon them, we did ensure
> that it's really really optimized (and teams where challenged to find
> performance improvements and open bugs on them, instead of making empty
> arguments that it's not). It would surprise me if you'd find a single strdup
> or malloc that shouldn't be there, for example. But I'll be more than happy
> if you eliminate one.
>
> Next you have indexes and domain specific indexes that'll slow down
> inserting. And you have the signals on changes that you can turn off on a
> class, which will have a memory usage and performance impact while inserting
> (not doing something is always faster than doing something).

Also interesting. I assume these indexes are crucial for good lookup
speeds however? I will definitely have a look at them and again try to
find a good balance here.

>
> If you don't need FTS, then disabling FTS should make a huge performance
> improvement. For the same reason (a lot of things wont be done anymore,
> which is always faster than doing them. But FTS is also a nice feature to
> have. So make your choice).
>
Good point! I remember this being mentioned earlier on the list. It
might be possible to work around FTS not being available if
performance is motivating enough. Thanks for all the tips!


--
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to