[twitter-dev] Re: Twitter, Please Explain How Cursors Work

John Kalucki Wed, 07 Oct 2009 09:24:36 -0700

First you have to assume no changes to the set. Users with any
significant following will see constant churn. Factoring out natural
churn then:


Ideally, the results are the same. Practically, the results are the
same. In a very few corner cases they are not. For the next several
weeks, for edges that were created over ~2 weeks ago, there will be,
very very rarely, issues with cursor jitter: In theory and in practice
there will be some over-delivery -- the last userid, or so, in a block
may be duplicated in the first rows a subsequent block. In theory
there might be similar under-delivery, but we haven't found an actual
case of under-delivery yet. You may need to deduplicate your results
if your app is very sensitive to duplication. In any case, new edges
no longer suffer from this jitter, and we're going to repair the whole
graph in a few weeks. I think this will require several megawatthours
of computation.

Your first two statements are correct. I don't understand your third
statement. But I think it is a false assertion. Could you briefly
restate?

An aside: There may be some signal in the cursors. Especially in the
most significant bytes. They're references into the edge-creation-time
index after all. I don't know how much obfuscation there is,
especially in the lsb's, but the cursors ideally should be treated as
opaque tokens. While unlikely, we may change their format at some time
in the future. And then various acts of daring do could break.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

On Oct 7, 6:57 am, Jeffrey Greenberg <jeffreygreenb...@gmail.com>
wrote:
> John,Please clarify this scenario. If one makes a complete set of calls
> starting from cursor -1 unto the end at one moment, and then another set of
> the same calls later is there any invariance?  If so what?
>
> From the statements above I understand:
> - always 5000 followers are returned (if the user has more than 5000, and
> the last call will have less)
> - the order is the same: it's the time order that users followed this
> account
>
> And thus:
> - there is no correlation in the API between a particular cursor and a set
> of returned values (followers)
>
> Is that it?
>
> On Tue, Oct 6, 2009 at 4:12 PM, John Kalucki <jkalu...@gmail.com> wrote:
>
> > I described, in some detail, the reasons for cursors here:
>
> >http://groups.google.com/group/twitter-development-talk/msg/badfb7b60...
>
> > If the details are uninteresting, the high-level summary is this: The
> > paged API was designed in a previous era. Paging is simply too
> > expensive and totally impractical to provide with the current
> > following counts. Also the QoS had deteriorated to the point where
> > some doubted that anyone was seriously using the methods. Paging is
> > going away and paging is not coming back.
>
> > The cursored approach allows us to continue to provide access to the
> > social graph via the REST API. As a benefit, QoS has been dramatically
> > improved and data quality is now pretty close to perfect.
>
> > If the implementation details and invariants described are confusing,
> > then stick to the well worn part of the path: Request the first block
> > with a cursor of -1. Keep requesting forward until you get a cursor of
> > 0.
>
> > -John Kalucki
> >http://twitter.com/jkalucki
> > Services, Twitter Inc.
>
> > On Oct 6, 11:06 am, Jesse Stay <jesses...@gmail.com> wrote:
> > > I said the same thing in the last thread about this - still no clue what
> > > Twitter is doing with cursors and how it is any different than the
> > previous
> > > paging methods.
> > > Jesse
>
> > > On Tue, Oct 6, 2009 at 10:22 AM, Dewald Pretorius <dpr...@gmail.com>
> > wrote:
>
> > > > Thanks John. However, I will be the first to put up my hand and say
> > > > that I have no clue what you said.
>
> > > > Can someone please translate John's answer into easy to understand
> > > > language, with specific relation to the questions I asked?
>
> > > > Dewald
>
> > > > On Oct 5, 1:17 am, John Kalucki <jkalu...@gmail.com> wrote:
> > > > > I haven't looked at all the parts of the system, so there's some
> > > > > chance that I'm missing something.
>
> > > > > The method returns the followers in the reverse chronological order
> > of
> > > > > edge creation. Cursor A will have the most recent 5,000 edges, by
> > > > > creation time, B the next most recent 5,000, etc. The last cursor
> > will
> > > > > have the oldest edges.
>
> > > > > Each cursor points to some arbitrary edge. If you go back and
> > retrieve
> > > > > cursor B, you should receive N edges created just before the edge-
> > > > > pointed-to-by-B was created. I don't recall if N is always 5000,
> > > > > generally 5000 or if it's at most 5000. This detail shouldn't matter,
> > > > > other than, on occasion, you'll make an extra API call.
>
> > > > > In any case, retrieving cursor B will never return edges created
> > after
> > > > > the edge-pointed-to-by-B was created. All edges returned by cursor B
> > > > > will be no-newer-than, and generally older than, than the
> > edge-pointed-
> > > > > to-by-B.
>
> > > > > So, all future sets returned by cursor B are always disjoint from the
> > > > > set originally returned by cursor A. In your example, if you
> > refetched
> > > > > both A and B, the result sets wouldn't be disjoint as there are no
> > > > > longer 5,000 edges between cursor A and cursor B.
>
> > > > > I think this, in part answers your question. ?
>
> > > > > -John Kaluckihttp://twitter.com/jkalucki
> > > > > Services, Twitter Inc.
>
> > > > > On Oct 4, 6:10 pm, Dewald Pretorius <dpr...@gmail.com> wrote:
>
> > > > > > For discussion purposes, let's assume I am cursoring through a very
> > > > > > volatile followers list of @veryvolatile. We have the following
> > > > > > cursors:
>
> > > > > > A = 5,000
> > > > > > B = 5,000
> > > > > > C = 5,000
>
> > > > > > I retrieve Cursor A and process it. Next I retrieve Cursor B and
> > > > > > process it. Then I retrieve Cursor C and process it.
>
> > > > > > While I am processing Cursor C, 200 of the people who were in
> > Cursor A
> > > > > > unfollow @veryvolatile, and 400 of the people who were in Cursor B
> > > > > > unfollow @veryvolatile.
>
> > > > > > What do I get when I go back from C to B? Do I now get 4,600 ids in
> > > > > > the list?
>
> > > > > > Or, do I get 5,000 in B, which now includes a subset of 400 ids
> > that
> > > > > > were previously in Cursor A?
>
> > > > > > Dewald

[twitter-dev] Re: Twitter, Please Explain How Cursors Work

Reply via email to