Hi all, Thinks are all going well on the indexer-split branch, however, it occurred to me that daemon has its work duplicated in the indexer. We need to resolve where the responsibility lies for the daemon and the indexer.
The Modules =========== First about the modules. So we have these modules, they all share a common API. This API includes functions to: - Index content - Get directories - Know if a file or directory should be ignored The modules include: - applications - files - gaim-conversations - firefox-history The idea is, each of these modules know how to index, locate and ignore particular files and directories pertaining to their specific arena (i.e. instant messaging, browsing, applications, etc). The Daemon ========== So, recently in the daemon, I just finished writing the code to crawl the file system and queue ALL files in $HOME or where ever the config says we should index files from. The daemon also sets up monitors for each directory it finds along the way. This is all done using the new GIO functions and works nicely. The files found are then sent in chunks to the indexer to process. This includes monitor updates to files. The Indexer =========== The indexer process works like a state machine with 3 queues for: - Files - Directores - Modules The files queue has the highest priority, individual files are stored here, waiting for metadata extraction, etc... files are taken one by one in order to be processed, when this queue is empty, a single token from the _next_ queue is processed. The directories queue is the _next_ queue. Directories are waiting for inspection here. When a directory is checked the contained files and directories will be prepended in their respective queues. When this queue is empty, a single token from the _next_ queue is processed. The last queue and again the _next_ queue after the directory queue is the modules queue. When all files from the previous file have been inspected, the next module then does its part and this continues until all modules are finished. At this point the indexer quits. IT should be noted here, the indexer is an impermanent entity. It only survives to process work given to it. The Problem =========== The question is, should the daemon do some of this work? The issue here for the daemon is that what it does is highly specific to "files" only. It doesn't know anything about instant messaging files, locations, what should be ignored, what should be monitored, etc. When running the indexer right now, it sits at about 25%->33% in the background indexing files (on my laptop), on my desktop, it can index my 140k files in about 130 seconds using no throttling and the system is very usable during this time (and we haven't optimised anything yet either). The daemon, however, does absolutely nothing after the initial 10-15 seconds (which is how long it takes to set up 6500 monitors and get all 140k files in my home directory 30k of which have been ignored as being unsuitable). So the statistics look good, but the daemon can do more and should be doing things like monitoring the desktop file directory so we know when applications are added, removed or updated. To do this, we have been thinking about how best to design the indexer/daemon work load so it is most efficient. The How ======= So after speaking with Carlos some more about this, the basic idea we had was to make the indexer JUST index. To do this means the modules need to be shared. This is so that the indexer can get each module to index files the way it knows how to index and so the daemon can request locations to monitor and crawl. The idea being that the daemon crawls the files and sends all files and directories (we currently don't send directories, just files) to the indexer. The indexer needs both files and directories to add these to the database. We can take this one step further. We can even have the daemon check in the database before sending files to the indexer to make sure we are not generating extra work unnecessarily. This is something we don't do at all yet, but is planned. The Conclusion ============== This work is mostly done right now. It is merely a case of moving the architecture around a bit and moving code between processes. But is this the right approach, what do you think? Comments welcome! -- Regards, Martyn _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list