On Sun, 19 Nov 2006 12:19:45 +0100 "Jos van den Oever" <[EMAIL PROTECTED]> wrote:
> Hi Mikkel, > > Yes, the common dbus api is still something we need. I wanted to start > on the metadata standarization first, but we can do the searching api > in parallel. You make a good start in listing the available engines. > There might even be more. To coordinate we need a process that lists > the available search engines over dbus. An application should be able > to say: I want to search using a particular interface with the > available search engines. > > The attached archive contains an effort to do two things: > - propose a very simple, common api for search engines > - implement such a coordinating daemon > The code contains the daemon, a demo search application and a python > client to access it by finding the search engine over the > searchmanager. > > The proposal for the search api is _very_ simple and I call for > application developers to see if the function calls in there are > sufficient. > Here i paste them for convenience: > > > interface org.freedesktop.search.simple > > method startConfiguration ( ) > Open a graphical interface for configuring of search tool. > > method countHits ( in s query , out i count ) > Count the number of instances of a file that match a particular query. > Input: > query > The query being performed. > Output: > count > The number of documents that match the query. > > method query ( in s query, in i offset, in i limit , out as hits ) > Perform a query and return a list of files that match the query. > Input: > query > The query being performed. > offset > The offset in the result list for the first returned result. > limit > The maximum number of results that should be returned. > > Output: > hits > A list if filenames that are the result of the query. > > method getProperties ( in as files,in a(sa(sas)) properties ) > Get properties for the given files. > Input: > files > A list of files for which properties should be returned. > properties > The properties belonging to each file. Each property is a name > associated with a list of string values. The index of each property > map in the list corresponds to the index of the filename in the list > of files. I have constructed a in-house application which does pretty much exactly what you describe (it doesn't yet speak dbus, but corba and soap). Sadly I'm not allowed to release the source of this application, but at least I can share some of my experience. (I haven't yet looked closely on your source, so I might have misunderstood some things) If several search engines are available, the search manager lets the client know of each search engine according to your proposal (right?). I think it would be a better idea to present a list of indexes (of which each search engine might provide several) to search in, but by default search in all of them (if appropriate). Instead of registering the the search engine I think it's better to think in terms of creating a session (which might still do exactly the same thing). Because this should affect all appropriate search engines transparently. And because it might be desired to alter some options for the session (language, fussiness, search contexts and such). In addition to this session object I have found it suitable to also have a search object (created from a query) because applications might construct very complicated queries. This object can then is passed to countHits, and used for getting the hits. And also for getting attributes of the hit (matching document, score, language and such). (Note that a hit is not equivalent to a document.) Daemon or no daemon, that is the question. This is a question that without doubt will arise (it always does). First we need to clarify that there is a difference between a daemon doing the indexing of document (or rather detecting new documents needed to be indexed) and a daemon performing the search (and possibly merging several searches). Most search engines I use don't have a daemon for doing the searches (instead the only provide a library), because that is seldom considered required. Indexes are read only (then searching) so the common problems daemons are used to solve are not present. My solution (which took me quite a while to develop) might seem overly complicated at first, but I think it really isn't. It was to implement all functionality (including caching and merging of searches) in a library. That library can be used by an application to do everything. Or the application can use it just to contact a daemon (which of course also uses the very same library for everything it does). This also has the nice side effect that daemons can be chained, so searches can span over several computers (if it supports at least one network transparent communication mechanism). I think it would also be a good idea for the library to support plugins for different search engines/communication mechanisms. One of the plugins is the one using the dbus search interface. Other plugins could be made for existing search engines like Lucene, Swish(++|E), mnoGoSearch, Xapian, ht://Dig, Datapark, (hyper)estraier, Glimpse, Namatzu, Sherlock Holmes and all the other. Which would surely be a lot easier than convincing each of them to implement a daemon which provides a dbus interface. One thing that English users seldom consider is the usages of several languages. Which language is being used is important to know in order to decide what stemming rules to use, and which stop-words use (in English "the" is a stop-word while it in Swedish means tea and is something that is adequate to search for). People using other languages are very often multi lingual (using English as well). Therefore it is interesting to know which language the query is in (search engines might also be able to translate queries to search in document written in different languages). _______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
