On Wed, Apr 29, 2009 at 03:31:52PM -0700, William Morgan wrote: > (All this rigamarole about ordinals and blah blah blah is necessary > because I don't want Sup to rescan the entire Maildir unless absolutely > necessary. One day I'll convert my mbox to a Maildir with 250k files in > it, and a rescan will kill me, especially at Ruby speed.)
How are we defining 'rescan' here? I can think of a couple of possible meanings, and I'm not sure where you are: 1. Open every file in the maildir and read from it, every poll. 2. Visit every filename in the maildir to consider whether it is a new file, every poll. 3. Something else I'm not thinking of this instant. If 1, would it suffice to preserve a list of which messages you've already added to the index? (Comparing against the database, which would contain the filenames for all the messages.) If 2: > a) Sort files by timestamp, and then by something else (maybe name), and Doesn't this effectively require visiting every filename in the maildir? I suspect I'm not entirely clear on what we're optimizing for, or I'm missing something about the relative costs of operations. ... I have just checked into maildir...I thought I remembered something from the last time I looked at it: maildir filenames are of the form 'time.pid.host:info', and are supposedly unique. If the desired name is already taken, the MDA sleeps for 2s then tries again. [see man maildir] So the only way you're going to get a timestamp collision (on the filename timestamp, perhaps not on the actual ctime) is if you have multiple processes delivering mail simultaneously, or if you're synchronizing mail delivered on multiple hosts. In the case of the latter, a timestamp-based heuristic for finding new messages isn't going to work. Would it suffice to keep track of the filename of the most recent message added for each maildir source, and check everything with a time portion of the filename equal to or greater than that message for whether it needs to be added? [although see above about whether that gains anything] It seems you're "supposed" to move things from new/ to cur/ after you've indexed them to solve exactly this problem, but I know this clashes with the sup design philosophy. Having looked into this more closely, I'm starting to seriously reconsider whether maildir is really what I want to be using for storing my mail. There are some weird timing issues.
pgpa5n3FOoubg.pgp
Description: PGP signature
_______________________________________________ sup-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/sup-talk
