On Wed, Apr 29, 2009 at 03:31:52PM -0700, William Morgan wrote:
> (All this rigamarole about ordinals and blah blah blah is necessary
> because I don't want Sup to rescan the entire Maildir unless absolutely
> necessary. One day I'll convert my mbox to a Maildir with 250k files in
> it, and a rescan will kill me, especially at Ruby speed.)

How are we defining 'rescan' here?  I can think of a couple of possible
meanings, and I'm not sure where you are:

1. Open every file in the maildir and read from it, every poll.
2. Visit every filename in the maildir to consider whether it is a new
file, every poll.
3. Something else I'm not thinking of this instant.

If 1, would it suffice to preserve a list of which messages you've
already added to the index? (Comparing against the database, which would
contain the filenames for all the messages.)

If 2:

> a) Sort files by timestamp, and then by something else (maybe name), and

Doesn't this effectively require visiting every filename in the maildir?

I suspect I'm not entirely clear on what we're optimizing for, or I'm
missing something about the relative costs of operations.

...

I have just checked into maildir...I thought I remembered something from
the last time I looked at it: maildir filenames are of the form
'time.pid.host:info', and are supposedly unique.  If the desired name is
already taken, the MDA sleeps for 2s then tries again. [see man maildir]

So the only way you're going to get a timestamp collision (on the
filename timestamp, perhaps not on the actual ctime) is if you have
multiple processes delivering mail simultaneously, or if you're
synchronizing mail delivered on multiple hosts.  In the case of the
latter, a timestamp-based heuristic for finding new messages isn't going
to work.

Would it suffice to keep track of the filename of the most recent
message added for each maildir source, and check everything with a time
portion of the filename equal to or greater than that message for
whether it needs to be added?  [although see above about whether that
gains anything]

It seems you're "supposed" to move things from new/ to cur/ after you've
indexed them to solve exactly this problem, but I know this clashes with
the sup design philosophy.

Having looked into this more closely, I'm starting to seriously
reconsider whether maildir is really what I want to be using for storing
my mail.  There are some weird timing issues.

Attachment: pgpa5n3FOoubg.pgp
Description: PGP signature

_______________________________________________
sup-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/sup-talk

Reply via email to