you mean pull results over dbus and then page at client? thats inefficient - pulling 10,000 hits over dbus is insanely slow (even just the URI)
Paging is a must have im my book otherwise tracker api will have to be used a lot instead of xesam whenever paged results are desired (more likely we will add Paged search to xesam on top of the standard) jamie On Tue, 2008-05-06 at 17:12 +0200, Mikkel Kamstrup Erlandsen wrote: > 2008/5/6 Jamie McCracken <[EMAIL PROTECTED]>: > > > On Tue, 2008-05-06 at 16:57 +0200, Mikkel Kamstrup Erlandsen > wrote: > > 2008/5/2 Mikkel Kamstrup Erlandsen > <[EMAIL PROTECTED]>: > > I have a handful comments about this (Jos also asked > about the > > same on > > IRC recently). > > It was in fact a design decision, but i am writing > this from > > my mobile > > since I'm on holiday, so I'll elaborate when I get > home > > tuesday. > > > > Cheers, > > Mikkel > > > > As promised... > > > > Let's first establish some terminology. A Paged Model is one > where you > > can request hits with an offset and a count. A Streaming > Model is one > > like we have now, where you specify how many hits to read on > each > > request and then read hits sequentially (like file reading > without > > seeking). > > > > It should be noted that the Xesam Search spec is designed > for desktop > > search (and not generic search on a database or Google-style > web > > search with millions of hits). Furthermore it should be > feasible to > > implement in a host of different backends, not just full > fledged > > search engines. > > > > There are basically three backends where a paged model can > be > > problematic. Web services, Aggregated searches, and > Grep/Find-like > > implementations. > > > > * Web services. While Google's GData Query API does allow > paging, not > > all webservices does this. For example the OAI-PMH[1] > standard does > > not do paging, merely sequential reading. Ofcourse OAI-PMH > is a > > standard for harvesting metadata, but I could imagine a > "search > > engine" extracting metadata from the OAI-PMH result on the > fly. > > > > * Aggregated search. Consider a setup where the Xesam > search engine > > is proxying a collection of other search engines. It is a > classical > > problem to look up hits 1000-1010 in this setup. The search > engine > > will have to retrieve the first 1010 hits from all > sub-search engines > > to get it right. Maybe there is a clever algorithm to do > this more > > cleverly, but I have not heard of it. This is ofcourse also > a problem > > in a streaming model, but it will not trick developers into > believing > > that GetHits(s, 1000, 1010) is a cheap call. > > > > * Grep-like backends or more generally backends where the > search > > results will roll in sequentially. > > > > I think it is a bad time to break the API like this. It is > in fact a > > quite big break if you ask me, since our current approach > has been > > stream-based and what you propose is changing the paradigm > to a page > > based model. Also bad because it is the wrong signal to send > with such > > and important change in the last minute. > > > > I see a few API-stable alternatives though. > > > > 1) Add a SeekHit(in s search, in i hit_id, out i new_pos). > This > > basically adds a cursoring mechanism to the API > > 2) In style of 1) but lighter - add SkipHits(in s search, in > i count, > > out i new_pos) > > > > These options also stay within the standard streaming > terminology. We > > could make them optional by making them throw exceptions if > the (new) > > session property vendor.paging is True. > > > > As Jos also points out later in the thread GetHitData is > actually > > paging and the workaround he describes can actually be made > very > > efficient since we already have the hit.fields.extended > session prop > > to hint what properties we will fetch. > > > > Let me make it clear that I am not refusing the change to a > paging > > model if that is what the majority rules. We should just > make an > > informed decision that we are sure we agree on. > > > > > > im proposing adding new api not breaking existing ones. The > existing > stuff can easily emulate paging if it lacks native support > > I would prefer new api that takes a start point param and a > count/length > param sow e have full random access > > And how is GetHitData not good enough for that? > > Cheers, > Mikkel > _______________________________________________ Xesam mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xesam
