Hi Jamie,

On 10/12/06, Jamie McCracken <[EMAIL PROTECTED]> wrote:
> I've noticed when indexing *large* amounts of data that a lot of disk
> thrashing is taking place which is greatly slowing down performance of
> both tracker and the system in general.
>
> Also the nice +10 is not throttling enough (I dont have ionice in my
> kernel so I dont know how good a job that does) so I will probably add
> some sleeping intervals to smooth things out and keep cpu usage low
> (with a --turbo command line option to disable this for those that want
> faster indexing)
>
> The cause of the slow down is heavy fragmentation of the file based hash
> table.
>
> Having indexed 30GB of stuff, the optimization routine shrank the full
> text index from nearly 300MB to 20MB which means a massive 280MB of
> fragmentation had occurred - this is obscene!

Yep, with the new tracker I reindexed all my files - the index was
about 630MB before optimization, and 31 MB (!) after... I observed up
to 80% iowait. It is even worse, when indexing more files than that,
my machine stands still every now and than!!!

> I note other indexers do not update the hash table directly but cache
> the data in memory and then bulk upload it to reduce fragmentation and
> lessen the performance hit. The disadvantage of this is searches for
> newly indexed content wont appear until the cache is uploaded to the
> hash table. (we could upload every 10-15 mins or something - infrequent
> words should be updated more quickly though)

Even 5 minutes should be fine I think.

> As we are memory conservative, I am planning to do something similiar
> but using sqlite (instead of precious memory) to cache new files and
> then bulk upload. We could easily cache the data for many thousands of
> files before uploading them.
>
> We can actually do better than others here because firstly we are not
> using any more RAM so can therefore have much bigger caches and secondly
> unlike other indexers which upload all at once (which often causes a cpu
> spike) we can do it incrementally in sqlite.
>
> And no sqlite will not fragment as its btree based and not a hash table
> (btrees are much faster to update then hashes) and we will use a
> seperate db file which can be deleted when finished.
>
> Will be experimenting on this tonight. There will be a few race
> conditions to handle with this but its nothing too complex.
>
> I am determined to get tracker running as smooth as a baby's bottom!

Yes, please :)



Best regards and good luck, Marcus
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to