On Tue, 2008-05-06 at 17:26 +0200, Mikkel Kamstrup Erlandsen wrote: > 2008/5/6 Jamie McCracken <[EMAIL PROTECTED]>: > you mean pull results over dbus and then page at client? > > No. The signature of GetHitData is (in s search_handle, in au hit_ids, > in as fields, out aav hits) > > Ie you request which hits ids to fetch. To fetch a page pass [n, n > +1, ..., n+page_size] as hit_ids.
hit_ids are not sequential! we will use service_id for these which will be random in a search > > > thats inefficient - pulling 10,000 hits over dbus is insanely > slow (even > just the URI) > > Hmmm, how slow is "insanely slow"? I doubt that this is true (by my > standards of insanely slow). > > > Paging is a must have im my book otherwise tracker api will > have to be > used a lot instead of xesam whenever paged results are desired > (more > likely we will add Paged search to xesam on top of the > standard) > > With a seekable API paging is easy to implement on the client. we dont want to implement on client ! this is server based paging > > Cheers, > Mikkel > > > > > On Tue, 2008-05-06 at 17:12 +0200, Mikkel Kamstrup Erlandsen > wrote: > > 2008/5/6 Jamie McCracken <[EMAIL PROTECTED]>: > > > > > > On Tue, 2008-05-06 at 16:57 +0200, Mikkel Kamstrup > Erlandsen > > wrote: > > > 2008/5/2 Mikkel Kamstrup Erlandsen > > <[EMAIL PROTECTED]>: > > > I have a handful comments about this (Jos > also asked > > about the > > > same on > > > IRC recently). > > > It was in fact a design decision, but i am > writing > > this from > > > my mobile > > > since I'm on holiday, so I'll elaborate > when I get > > home > > > tuesday. > > > > > > Cheers, > > > Mikkel > > > > > > As promised... > > > > > > Let's first establish some terminology. A Paged > Model is one > > where you > > > can request hits with an offset and a count. A > Streaming > > Model is one > > > like we have now, where you specify how many hits > to read on > > each > > > request and then read hits sequentially (like file > reading > > without > > > seeking). > > > > > > It should be noted that the Xesam Search spec is > designed > > for desktop > > > search (and not generic search on a database or > Google-style > > web > > > search with millions of hits). Furthermore it > should be > > feasible to > > > implement in a host of different backends, not > just full > > fledged > > > search engines. > > > > > > There are basically three backends where a paged > model can > > be > > > problematic. Web services, Aggregated searches, > and > > Grep/Find-like > > > implementations. > > > > > > * Web services. While Google's GData Query API > does allow > > paging, not > > > all webservices does this. For example the > OAI-PMH[1] > > standard does > > > not do paging, merely sequential reading. Ofcourse > OAI-PMH > > is a > > > standard for harvesting metadata, but I could > imagine a > > "search > > > engine" extracting metadata from the OAI-PMH > result on the > > fly. > > > > > > * Aggregated search. Consider a setup where the > Xesam > > search engine > > > is proxying a collection of other search engines. > It is a > > classical > > > problem to look up hits 1000-1010 in this setup. > The search > > engine > > > will have to retrieve the first 1010 hits from all > > sub-search engines > > > to get it right. Maybe there is a clever algorithm > to do > > this more > > > cleverly, but I have not heard of it. This is > ofcourse also > > a problem > > > in a streaming model, but it will not trick > developers into > > believing > > > that GetHits(s, 1000, 1010) is a cheap call. > > > > > > * Grep-like backends or more generally backends > where the > > search > > > results will roll in sequentially. > > > > > > I think it is a bad time to break the API like > this. It is > > in fact a > > > quite big break if you ask me, since our current > approach > > has been > > > stream-based and what you propose is changing the > paradigm > > to a page > > > based model. Also bad because it is the wrong > signal to send > > with such > > > and important change in the last minute. > > > > > > I see a few API-stable alternatives though. > > > > > > 1) Add a SeekHit(in s search, in i hit_id, out i > new_pos). > > This > > > basically adds a cursoring mechanism to the API > > > 2) In style of 1) but lighter - add SkipHits(in s > search, in > > i count, > > > out i new_pos) > > > > > > These options also stay within the standard > streaming > > terminology. We > > > could make them optional by making them throw > exceptions if > > the (new) > > > session property vendor.paging is True. > > > > > > As Jos also points out later in the thread > GetHitData is > > actually > > > paging and the workaround he describes can > actually be made > > very > > > efficient since we already have the > hit.fields.extended > > session prop > > > to hint what properties we will fetch. > > > > > > Let me make it clear that I am not refusing the > change to a > > paging > > > model if that is what the majority rules. We > should just > > make an > > > informed decision that we are sure we agree on. > > > > > > > > > > > im proposing adding new api not breaking existing > ones. The > > existing > > stuff can easily emulate paging if it lacks native > support > > > > I would prefer new api that takes a start point > param and a > > count/length > > param sow e have full random access > > > > And how is GetHitData not good enough for that? > > > > Cheers, > > Mikkel > > > > > _______________________________________________ Xesam mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xesam
