Jamie McCracken wrote: > I've noticed when indexing *large* amounts of data that a lot of disk > thrashing is taking place which is greatly slowing down performance of > both tracker and the system in general. > > Also the nice +10 is not throttling enough (I dont have ionice in my > kernel so I dont know how good a job that does) so I will probably add > some sleeping intervals to smooth things out and keep cpu usage low > (with a --turbo command line option to disable this for those that want > faster indexing) > > The cause of the slow down is heavy fragmentation of the file based hash > table. > > Having indexed 30GB of stuff, the optimization routine shrank the full > text index from nearly 300MB to 20MB which means a massive 280MB of > fragmentation had occurred - this is obscene! > > I note other indexers do not update the hash table directly but cache > the data in memory and then bulk upload it to reduce fragmentation and > lessen the performance hit. The disadvantage of this is searches for > newly indexed content wont appear until the cache is uploaded to the > hash table. (we could upload every 10-15 mins or something - infrequent > words should be updated more quickly though)
Why not hold back updates, but force flush to disk if a search is called? > > As we are memory conservative, I am planning to do something similiar > but using sqlite (instead of precious memory) to cache new files and > then bulk upload. We could easily cache the data for many thousands of > files before uploading them. If I remember correctly sqlite3 has some built in cache stuff, you might wanna tweak the standard values a bit. > > We can actually do better than others here because firstly we are not > using any more RAM so can therefore have much bigger caches and secondly > unlike other indexers which upload all at once (which often causes a cpu > spike) we can do it incrementally in sqlite. > > And no sqlite will not fragment as its btree based and not a hash table > (btrees are much faster to update then hashes) and we will use a > seperate db file which can be deleted when finished. > > Will be experimenting on this tonight. There will be a few race > conditions to handle with this but its nothing too complex. Looking forward to it :) > > I am determined to get tracker running as smooth as a baby's bottom! > > _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list