On Tue, 2008-05-06 at 16:57 +0200, Mikkel Kamstrup Erlandsen wrote: > 2008/5/2 Mikkel Kamstrup Erlandsen <[EMAIL PROTECTED]>: > I have a handful comments about this (Jos also asked about the > same on > IRC recently). > It was in fact a design decision, but i am writing this from > my mobile > since I'm on holiday, so I'll elaborate when I get home > tuesday. > > Cheers, > Mikkel > > As promised... > > Let's first establish some terminology. A Paged Model is one where you > can request hits with an offset and a count. A Streaming Model is one > like we have now, where you specify how many hits to read on each > request and then read hits sequentially (like file reading without > seeking). > > It should be noted that the Xesam Search spec is designed for desktop > search (and not generic search on a database or Google-style web > search with millions of hits). Furthermore it should be feasible to > implement in a host of different backends, not just full fledged > search engines. > > There are basically three backends where a paged model can be > problematic. Web services, Aggregated searches, and Grep/Find-like > implementations. > > * Web services. While Google's GData Query API does allow paging, not > all webservices does this. For example the OAI-PMH[1] standard does > not do paging, merely sequential reading. Ofcourse OAI-PMH is a > standard for harvesting metadata, but I could imagine a "search > engine" extracting metadata from the OAI-PMH result on the fly. > > * Aggregated search. Consider a setup where the Xesam search engine > is proxying a collection of other search engines. It is a classical > problem to look up hits 1000-1010 in this setup. The search engine > will have to retrieve the first 1010 hits from all sub-search engines > to get it right. Maybe there is a clever algorithm to do this more > cleverly, but I have not heard of it. This is ofcourse also a problem > in a streaming model, but it will not trick developers into believing > that GetHits(s, 1000, 1010) is a cheap call. > > * Grep-like backends or more generally backends where the search > results will roll in sequentially. > > I think it is a bad time to break the API like this. It is in fact a > quite big break if you ask me, since our current approach has been > stream-based and what you propose is changing the paradigm to a page > based model. Also bad because it is the wrong signal to send with such > and important change in the last minute. > > I see a few API-stable alternatives though. > > 1) Add a SeekHit(in s search, in i hit_id, out i new_pos). This > basically adds a cursoring mechanism to the API > 2) In style of 1) but lighter - add SkipHits(in s search, in i count, > out i new_pos) > > These options also stay within the standard streaming terminology. We > could make them optional by making them throw exceptions if the (new) > session property vendor.paging is True. > > As Jos also points out later in the thread GetHitData is actually > paging and the workaround he describes can actually be made very > efficient since we already have the hit.fields.extended session prop > to hint what properties we will fetch. > > Let me make it clear that I am not refusing the change to a paging > model if that is what the majority rules. We should just make an > informed decision that we are sure we agree on. >
im proposing adding new api not breaking existing ones. The existing stuff can easily emulate paging if it lacks native support I would prefer new api that takes a start point param and a count/length param sow e have full random access jamie _______________________________________________ Xesam mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xesam
