On Mon, May 04, 2009 at 09:10:23AM -0700, William Morgan wrote: > Rescanning the entire directory to check for new mail is unavoidable in > Maildir. What I *do* have some control over, i.e. can try to optimize, > is the following operation: given a file in the directory, does it > represent a new message? One way to do this would be to maintain a set > of all filenames representing read messages, and to check for existence > within the set, for every file. Sup doesn't do this; instead it defines > an ordering on filenames, such that newer filenames are "larger" than > older filenames under that ordering, and maintains a threshold > representing the dividing point between new and old. The idea being that > performing that comparison is quick, and that storing the set of read > messages is also quick.
Ok. We're on the same page, then. It seems to me that comparing a filename to a stored 'threshold' file name is about the same speed as checking a hash for existence of a key, and that a hash is also similar in speed for writes...the main thing you lose is that you have to store a potentially large construct [500k * ~30 bytes] in memory for it to work. Further, I don't know how database access speeds look (or if some subset of them can be speed optimized), but you *already* have to store the filename somewhere so you can go retrieve the messages when they're accessed, no? So I'm wondering if you're really gaining all that much from using the thresholding heuristic over doing it the naive way, and if it's worth the risk that messages fail to appear because they violate on of the assumptions of the heuristic. [For instance, what if a user does something weird like using an rsync cronjob to copy email delivered to another host over to the host running sup: you could easily have messages appearing older than the last-scanned timestamp.] > I think the collisions we are seeing are due to timestamp granularity. > A little research suggests that e.g. ext2fs ctimes are 1-second > granular. It seems quite easy to get more than one email per second. It does, but as I mentioned somewhere in that email, the reference implementation of maildir (qmail) says that it will only deliver one message per second. However, I definitely concede that relying on this behavior would be setting things up for something to go wrong down the line. > Yes, I think that's the solution. Basically switch from > to >= when > comparing timestamps, and realize that you're going to get some dupes. I think the solution might be to move messages from new/ to cur/, as Ben Walton suggested. As I understand it, that feature of maildir is intended *exactly* to cover this case. I know the "don't touch anything" philosophy, but are non-clobbering moves within a filesystem sufficiently safe? Other MUAs know that they need to check that some competing MUA didn't move things out of new/ when they weren't running; we might want to remind people that maildir can be problematic if more than one MUA is running simultaneously. If going this route, manual polls should also rescan cur/. That way if the user knows they may have invalidated things by running another MUA, they can get sup to see everything. > Well, mbox has the advantage of being nice and linear so this problem > goes away, but is horribly broken in other ways (c.f. the recent thread > about the "From problem"). IMAP is a ridiculous PITA to deal with (at > least as a client), so you're kinda screwed in every direction here. If > anything, Maildir is less broken than the other alternatives. I was kind of hoping that in the last many years someone had come up with something that sucks less by now. I saw reference to maildir+; I wonder if it actually solves anything (and if it's supported by anything). I picked maildir because I was aware mbox is flawed and have never even considered IMAP. > Plus Maildir works really well with git. :) I noticed that.
pgpmSwQx07Mk8.pgp
Description: PGP signature
_______________________________________________ sup-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/sup-talk
