Hi all, We (As in the contributors from Nokia) have been revisiting the idea of the daemon/indexer architecture, which has proven to be a much better approach, both conceptually and code-wise, however we still feel there are some issues left.
Current situation ================= Tracker is now split in daemon (which handles requests, does file monitoring, etc...) and the indexer (which collects metadata from files and saves them in the tracker database). Being as is, the indexer doesn't need many reasons to stay alive after indexing has been performed, and file operations (which aren't that usual in the normal user interaction) would trigger again the indexer to do its job. What we think could be the future ================================= Metadata could come from virtually any source (Emails, RSS feeds, ...), and it could also come at any time, not necessarily requiring user interaction, and the more metadata sources, the busier Tracker would get. This makes the separation between indexer and daemon less useful, since the indexer part would be more and more time active, as it's also responsible of writing data to databases. Also, as the indexer is told to handle more information, the DBus communication between both processes might become an issue in this scenario. Solution ======== We feel it would be better to have the write functionality of the indexer back to the daemon, and make it just hand the metadata information to the daemon so it's able to store it. As for the communication medium between the daemon and the indexer, it has to be compact and fast, so sending turtle files through a pipe sounds quite optimal. This architecture change would also ensure that the part that is responsible of storing metadata and monitor external sources is always kept alive. So summing up, these would be the roles: trackerd: - Database read and writes - File system monitoring - Importing metadata to the database using the TTL file format - handle remote/virtual data tracker-indexer: - File system crawling - Exporting metadata to the TTL file format for the daemon This solution would imply (again) plenty of changes in the code base, and would take some time to have it ready for inclusion in trunk, but we think it could be beneficial enough to be worth the pain, some pros could be: - More efficient communication than DBus. - Overall memory footprint is smaller. There's no need to duplicate strings on indexer/daemon sides for every file we crawl. - Faster response times for user requests when inserting and then looking up virtual data (based on various ontologies). - Much more light weight daemon than we have now. - Less code duplication (i.e. indexer/daemon crawling). Opinions? Issues? Regards, Carlos _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list