Re: [Tracker] index evolution mail is broken

Laurent Aguerreche Sun, 30 Nov 2008 14:30:11 -0800

Le dimanche 30 novembre 2008 à 22:47 +0100, Philip Van Hoof a écrit :
> On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote:
> > Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit :
> > > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:
> > > > Evolution 2.24 has migrate to use sqlite to store mail's summary.
> > > > 
> > > > So the parser which is based on parsing Summary　file is broken.
> > 
> > What I do not understand here is why Evo hackers have only replaced
> > summary files without including e-mail contents...
> 
> Because storing the content of E-mail in a database, for example as a
> BLOB, makes relatively few sense (as in: no sense whatsoever). Storing
> the metadata about the content of E-mail in a database does make some
> sense, though.
> 
> A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store
> E-mail content in files too. An IMAP server is actually just a database
> with a frontend that happens to talk a specific RFC. Yet these groups of
> IMAP server developers still don't store things in relative databases.
> 
> 
> For an E-mail client:
> 
> IMHO Ideal would be to store E-mail messages as directories, with each
> MIME part of the E-mail being a separate file within that directory, and
> all other data stored in a database (headers could be stored as triples
> in a RDF triple store - like 3store -).
> 
> rename() on the folder-name can be used for the flags, just like Maildir
> does to make the format easy to reuse and backup, and instead of having
> to parse the entire message, and having to download the entire message,
> you can store individual parts-of-interest as individual files. Reducing
> format complexity. Having to MIME-parse Maildir and nearly all of the
> other local formats is among the reasons why I dislike most of the local
> formats.
> 
> If a client wants to reduce disk-space, it can remove attachments of
> E-mails that are cached locally and available remotely (like with IMAP)
> easily: the client would just have to unlink() a bunch of files in a
> bunch of directories.
> 
> Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails
> and of MIME parts that are stored inside of E-mails (RFC822 forwarded
> messages) in a database or in an easily accessible format.
> 
> The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a
> piece of software like Tracker needs to 'index' E-mail folders.
> 
> You might also want to base64-decode the MIME parts before storing, as
> this makes it more easy for indexers to index/scan the files (which
> encoding depends on the e-mail, you don't want to needlessly require
> indexer softwares to become as complex as having to understand your
> format and how to find out in what encoding MIME-part files are encoded,
> you instead want to store the original data as-is). This also makes
> sense from the POV of how you read (and store) XMP/exif/etc data: if
> encoded or compressed you need to use more memory before you can read
> (or write) that data. While harddisk-space is near to unlimited nowa-
> days, I/O access speed ain't.
> 
> > > A better idea would be that if we'd instead of trying to parse
> > > Evolution's file ourselves, make a Evolution plugin that over IPC,
> > > shmem, pipe() or whatever pumps its data over to us.
> > 
> > The last time I read Beagle's code, I found out it was also trying to
> > parse Evo's internal caches. Abstract access to Evo's caches is a good
> > idea but it can't be just a Tracker plugin, it has to be something used
> > by any program that wants to access Evo's data (and Evolution itself).
> 
> Sure
> 
> > It seems that Evo hackers are trying to replace Bonobo code in flavour
> > of DBus, would it be possible to us to also use DBus there or is it only
> > Evo internal things? See:
> > http://mail.gnome.org/archives/evolution-hackers/2008-November/msg00009.html
> 
> The remote IPC (CORBA and now increasingly DBus) APIs that Evolution
> exposes have nothing to do with its E-mail functions. Only with
> calendaring, contact and TODO lists.
> 
> Evolution Data Server is not, what I call, "E-mail as a service".
> Everything about E-mail, in Evolution, happens in-process of the
> "evolution" process (that's the UI, the shell if you prefer that name). 
> 
> It's just a marketing stunt that "camel" is included in the EDS
> (Evolution Data Server) package. The "camel" library is technically not
> part of EDS. The Evolution shell dynamically links with it, and runs its
> code completely in-process of itself. That's unlike the services that
> provide the Calendar, Tasks, Todo and Contact data. Which are provided
> by the 'actual' Evolution Data Server.
> 
> Evolution Data Server "does not" serve E-mail. Don't let anybody tell
> you that, because it doesn't. (it's also not really a secret, just a
> misconception that a lot of people seem to have about Evolution).
> 
> Camel is indeed the library that Evolution uses for its E-mail
> abstraction, but Camel is a normal shared library that runs in-process,
> not a service that gets communicated with from another process (like the
> shell).
> 
> Even worse. You can't use camel on top of the same "cache dir", as that
> will make Camel write the same summary files, and the same E-mail
> content cache files, in parallel by both processes consuming the Camel
> library.
> 
> Which would result in data corruption. There are also no fcntl or flock
> locks placed on the files in question. Camel's design probably wouldn't
> cope with such locks either, unless you rewrite quite a bit of it first,
> or if you invent a recursive file-lock-ish thing.
> 
> Evolution's use of Camel would probably make the UI hang each time Camel
> would hit such a flock() lock, as not everything in Camel happens in a
> thread created by Evolution.
> 
> UIs that 'hang' (don't get redrawn) for some seconds, because another
> process is doing something, ain't the kind of behaviour people like in
> an E-mail client.


Thank you for this very interesting answer!

If I understand your message, Evo would need to not use Camel directly
anymore but an "e-mail service" instead which would be able to deal with
concurrent read and write. More precisely, this service should allow Evo
to read or write anything ASAP while a "spy program" (like Tracker)
would be able to read data without blocking Evo accesses...

It seems to me that this a task is for Evo hackers and that it will take
a long time to implement. Any opinions?


Laurent.

> > > Right now trying to parse Evolution's hideous file formats is quite
> > > crazy, and each time they change their format we will have to fix our
> > > code too.
> > > 
> > > It's also "not correct" to read Evolution's internal cache files.
> > > Evolution is not designed to either cope with another process tampering
> > > with its caches nor will it care about the other process, at all.
> > > 
> > > If you guys at Sun want to join to fun, such a Evolution plugin would be
> > > an excellent contribution indeed. Perhaps also one for Thunderbird and
> > > some other E-mail clients ... and we can safely enter 2009!
> 
>

signature.asc
Description: Ceci est une partie de message numériquement signée

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] index evolution mail is broken

Reply via email to