Hi Jamie, On 10/12/06, Jamie McCracken <[EMAIL PROTECTED]> wrote: > I've noticed when indexing *large* amounts of data that a lot of disk > thrashing is taking place which is greatly slowing down performance of > both tracker and the system in general. > > Also the nice +10 is not throttling enough (I dont have ionice in my > kernel so I dont know how good a job that does) so I will probably add > some sleeping intervals to smooth things out and keep cpu usage low > (with a --turbo command line option to disable this for those that want > faster indexing) > > The cause of the slow down is heavy fragmentation of the file based hash > table. > > Having indexed 30GB of stuff, the optimization routine shrank the full > text index from nearly 300MB to 20MB which means a massive 280MB of > fragmentation had occurred - this is obscene!
Yep, with the new tracker I reindexed all my files - the index was about 630MB before optimization, and 31 MB (!) after... I observed up to 80% iowait. It is even worse, when indexing more files than that, my machine stands still every now and than!!! > I note other indexers do not update the hash table directly but cache > the data in memory and then bulk upload it to reduce fragmentation and > lessen the performance hit. The disadvantage of this is searches for > newly indexed content wont appear until the cache is uploaded to the > hash table. (we could upload every 10-15 mins or something - infrequent > words should be updated more quickly though) Even 5 minutes should be fine I think. > As we are memory conservative, I am planning to do something similiar > but using sqlite (instead of precious memory) to cache new files and > then bulk upload. We could easily cache the data for many thousands of > files before uploading them. > > We can actually do better than others here because firstly we are not > using any more RAM so can therefore have much bigger caches and secondly > unlike other indexers which upload all at once (which often causes a cpu > spike) we can do it incrementally in sqlite. > > And no sqlite will not fragment as its btree based and not a hash table > (btrees are much faster to update then hashes) and we will use a > seperate db file which can be deleted when finished. > > Will be experimenting on this tonight. There will be a few race > conditions to handle with this but its nothing too complex. > > I am determined to get tracker running as smooth as a baby's bottom! Yes, please :) Best regards and good luck, Marcus _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list