Hey, As of now, embedded metadata extraction works in a pretty simple manner, tracker-extract processes files one by one in the main thread, and tracker-miner-fs both controls the request rate and handles errors in extraction.
When handling errors, tracker-miner-fs does so relying a lot on current tracker-extract behavior, immediately discarding a file if there's suspicion that it's causing some trouble in the extractor (mostly timeout/no_reply dbus conditions), the heuristic being that if it's the oldest file in the current request, it must be the one that tracker-extract was working on. So, let's figure tracker-extract goes threaded for extraction, and makes a difference between extract modules that are safe to spawn in different threads, in a single thread only or even the main thread only (drifting out of subject here, but it's also something I'd like to see eventually done). If that's the case, error handling could be rather complicated, no_reply dbus errors very often means a crash, and would happen for every ongoing petition. Timeouts often mean that a extractor module has gone wild, be it by a bug there, corrupt data... timeouts could apply to just a single file, or cascade to a whole mimetype family, depending on the extractor module used behind. So I think the lowest common denominator would be to have tracker-miner-fs enter some "failsafe extraction" mode, where on extractor error: 1) file is added to a "failed extraction" list 1) the miner is paused 2) waits for all ongoing extractor replies 2a) if extraction failed, add files to the list too 3) goes through the list, request metadata again, one by one 3a) if request failed again, give up This does make sense for tracker-extract crashes, but could still potentially take a long time in timeout conditions for the "extractor module in single thread" case, so, thoughts? Cheers, Carlos _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list