Hi, On Fri, 2006-11-24 at 12:25 +0100, Jean-Francois Dockes wrote: > If we don't find an appropriate established language, I see at least two > options for a more structured approach: > - No query language: use a data structure representing the parsed query tree. > - Use an xml-based approach for more structure and extensibility.
I strongly agree with the principle behind both of these. We use data structures (that happened to be serialized to XML when sent from client to daemon) to represent our queries at the lowest level. By default all query parts are ANDed together, although you can then nest OR blocks. We do have a query language that is accepted when text is typed in, but in the end they are converted to the query data structure before they are processed by the daemon. > Query language again: > - Phrases: I see no reason to make phrases unoptionally > case-sensitive. Case sensitivity should be an option for any query > part. Case-sensitivity is a very expensive proposition for an indexer, > and I don't think that Recoll is the only one not supporting it at all > (same for diacritic marks by the way). Case sensitivity is almost never useful and almost never what the user wants. Look at any usability study on this. Beagle is always case insensitive (as it's handled by the analysis code). > API: > - Documents and files are not the same thing (think email message inside an > Inbox, Knotes). Both have their uses on the client side though (document > identifier to request a snippet, or a text preview, file to, well, do > something with the file). I don't know of a standard way to designate a > message inside an mbox file, this is a tricky issue. We can probably see > the document identifier as opaque, and interpreted only in the > backend. As you mention, there is no standard for this or almost anything else that isn't a file or web resource. What we've done is use URIs as our identifier at index time. On the client side, we pass them by default to standard handlers (xdg-open, desktop-launch, gnome-open) or pass specialized URIs to individual programs that understand them. (For example, Evolution mails are indexed with the email:/// URI scheme that only Evo understands. Ditto Evolution's contact and calendar items.) > The file identifier needs to be visible. Or is there a standard > way to separate the File and Subdoc parts in what the draft calls uris ? Here we've used the URI fragment to indicate these, but they are very Beagle-specific. The first part of a multipart email might be email://[EMAIL PROTECTED]/Inbox?uri=1234#0. Evolution doesn't support opening attachments directly, so the client UI knows to interpret these and just open the mail directly. This is a bit of an issue for archives, because a tarball might contain yet another tarball, resulting in a URI like "file:///home/joe/tarball1.tar.gz#junk/tarball2.tar.gz#foo/bar" I'm not sure that two fragment parts are valid. > - Using the query string as a query identifier is certainly feasible (ie > for repeated calls to Query() with successive offsets), but it somehow > doesn't feel right. Shouldn't there be some kind of specific query > identifier ? Query strings can be quite big (ie, after expansion by some > preprocessor). Agree. Unique D-Bus object paths per-query seem to make more sense to me. (And this is what Beagle used back when it used D-Bus.) Joe _______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
