Re: [sup-talk] Possible problem with maildir ID generation

Marc Hartstein Mon, 04 May 2009 09:54:29 -0700

On Mon, May 04, 2009 at 09:10:23AM -0700, William Morgan wrote:
> Rescanning the entire directory to check for new mail is unavoidable in
> Maildir. What I *do* have some control over, i.e. can try to optimize,
> is the following operation: given a file in the directory, does it
> represent a new message? One way to do this would be to maintain a set
> of all filenames representing read messages, and to check for existence
> within the set, for every file. Sup doesn't do this; instead it defines
> an ordering on filenames, such that newer filenames are "larger" than
> older filenames under that ordering, and maintains a threshold
> representing the dividing point between new and old. The idea being that
> performing that comparison is quick, and that storing the set of read
> messages is also quick.


Ok.  We're on the same page, then.

It seems to me that comparing a filename to a stored 'threshold' file
name is about the same speed as checking a hash for existence of a key,
and that a hash is also similar in speed for writes...the main thing you
lose is that you have to store a potentially large construct [500k * ~30
bytes] in memory for it to work.

Further, I don't know how database access speeds look (or if some subset
of them can be speed optimized), but you *already* have to store the
filename somewhere so you can go retrieve the messages when they're
accessed, no?

So I'm wondering if you're really gaining all that much from using the
thresholding heuristic over doing it the naive way, and if it's worth
the risk that messages fail to appear because they violate on of the
assumptions of the heuristic.  [For instance, what if a user does
something weird like using an rsync cronjob to copy email delivered to
another host over to the host running sup: you could easily have
messages appearing older than the last-scanned timestamp.]

> I think the collisions we are seeing are due to timestamp granularity.
> A little research suggests that e.g. ext2fs ctimes are 1-second
> granular. It seems quite easy to get more than one email per second.

It does, but as I mentioned somewhere in that email, the reference
implementation of maildir (qmail) says that it will only deliver one
message per second.  However, I definitely concede that relying on this
behavior would be setting things up for something to go wrong down the
line.

> Yes, I think that's the solution. Basically switch from > to >= when
> comparing timestamps, and realize that you're going to get some dupes.

I think the solution might be to move messages from new/ to cur/, as Ben
Walton suggested.  As I understand it, that feature of maildir is
intended *exactly* to cover this case.  I know the "don't touch
anything" philosophy, but are non-clobbering moves within a filesystem
sufficiently safe?  Other MUAs know that they need to check that some
competing MUA didn't move things out of new/ when they weren't running;
we might want to remind people that maildir can be problematic if more
than one MUA is running simultaneously.

If going this route, manual polls should also rescan cur/.  That way if
the user knows they may have invalidated things by running another MUA,
they can get sup to see everything.

> Well, mbox has the advantage of being nice and linear so this problem
> goes away, but is horribly broken in other ways (c.f. the recent thread
> about the "From problem"). IMAP is a ridiculous PITA to deal with (at
> least as a client), so you're kinda screwed in every direction here. If
> anything, Maildir is less broken than the other alternatives.

I was kind of hoping that in the last many years someone had come up
with something that sucks less by now.  I saw reference to maildir+; I
wonder if it actually solves anything (and if it's supported by
anything).  I picked maildir because I was aware mbox is flawed and have
never even considered IMAP.

> Plus Maildir works really well with git. :)

I noticed that.

pgpmSwQx07Mk8.pgp
Description: PGP signature

_______________________________________________
sup-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/sup-talk

Re: [sup-talk] Possible problem with maildir ID generation

Reply via email to