Re: Paging through a collection

Richard Newman Mon, 29 Sep 2014 08:56:50 -0700

On  29 Sep 2014, at 8:12 AM, Stefan Arentz <[email protected]> wrote:


> I don’t understand the logic here. Why doesn’t the client use the limit and 
> offset parameters to grab all history in three requests?

You’re exercising an old code path that was intended for use by mobile.

Nobody has stripped this out, because the risk of introducing bugs is higher 
than the benefit of simplification.

processIncoming does this:

* It fetches records up to the downloadLimit (no limit by default, a very low 
limit on mobile).
* If we fetched that many items, there are presumably more. Switch into 
batching mode. Fetch downloadLimit IDs. On mobile, this bumps from a very low 
limit up to 5000, so we expect to get mostly new records.
* Fetch batches of those items by ID (mobileGUIDFetchBatchSize, 
guidFetchBatchSize).

Obviously that looks weird if you hit downloadLimit in the first request, and 
you’re not on mobile. There’s no good way to say “give me a large number of 
relevant IDs, but not the records I just pulled”. One could modify the system 
to figure out how many records would be fetched, and then either fetch them or 
grab IDs, but I doubt anyone is motivated to do so.


Sync doesn’t use limit and offset for several reasons:

* Limit and offset don’t really make sense on the server. This is not a 
transactional system; it’s stateless, with an arbitrary amount of time between 
successive page fetches. The client can’t trust that it’ll get all the records 
by using paging, so it doesn’t use it at all.

* Server (and client!) writes could be occurring during this paging behavior. 
We don’t want to make Sync’s lack of safety worse by deliberately 
fast-forwarding past new records. You can see in engines.js that we set the 
timestamp before we start fetching batches.

* This batching persists across restarts. Paging would not (and wouldn’t make 
sense to do so, for the aforementioned reasons). This is/was important for XUL 
mobile.


Sync uses lots of requests for the last bit because we run into URL length 
limitations.

Most of this logic is thoroughly commented in _processIncoming in engines.js, 
if you want to dig a little deeper.


In a much better sync system, a client would do this (aping Git), or something 
equivalent to it:

* Pick an identified server state (HEAD -> hash)
* Incrementally fetch chunks of state — individually named — to be able to 
recreate the server state.
* Merge locally to create a new head.
* Push data for that head to the server.
* Fast-forward the server to that head.

We don’t have that system, alas, so instead it blindly grabs records and hopes 
for the best.


_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev

Re: Paging through a collection

Reply via email to