[twitter-dev] Re: Search API: since_id is now unreliable

Brooks Bennett Tue, 21 Jul 2009 20:03:54 -0700

Thanks for posting this Chad!

Doug, please keep us updated on how things progress with this issue so
we can pass along guidance to our user-base. Hopefully the
improvements will come in the near-term.


Thanks for all that you guys do!

Brooks

On Jul 21, 3:45 pm, Doug Williams <d...@twitter.com> wrote:
> Chad,Your assessment is spot on.
>
> At the heart of search there are a number of data stores that accept queries
> (reads) while at the same time perform writes from an indexer. Heavy load --
> large numbers of queries, large number of writes or both, or both -- can
> cause the write replication between the indexer and various data stores to
> grow inconsistent when a particular data store is blocked on a read.
>
> Unfortunately there is no easy fix for this problem at the moment. The
> search team has grown considerably in the last couple of weeks so as they
> get up to speed, the feature set and stability of search should continue to
> improve.
>
> Thanks,
> Doug
>
>
>
> On Tue, Jul 21, 2009 at 11:57 AM, Chad Etzel <jazzyc...@gmail.com> wrote:
>
> > Hi API Team,
>
> > A few of us have been discussing off list a funky behavior we have
> > been noticing and now users are starting to notice.
>
> > There is a problem for sites/apps like TweetGrid and TweetChat which
> > auto-refresh tweets based on the Search API using the since_id. People
> > are noticing that these sites are "missing tweets" when compared to
> > the search.twitter.com results page for the same query.
>
> > We believe what is happening is that the search servers are not
> > indexing tweets in a serial manner, and so a tweet with a higher id
> > may sneak into a search server and be indexed first before a tweet
> > with a lower id. This means that when the since_id is sent back from
> > the query (or derived from the first result in the results array),
> > using that since_id to refresh the query will miss lower id tweets
> > when they finally do get indexed. So the illusion of "missing tweets"
> > is created. You can run TweetGrid and TweetChat in separate tabs using
> > the same query and see that sometimes the results don't match up
> > because of this.
>
> > I'll try to give an example to be clear.
>
> > Let's say for the sake of simplicity that I'm searching for "twitter"
> > and that every 10th tweet in the public timeline matches. So, all
> > tweets ending in 0 match my query.
>
> > Search server 1 may index:
>
> > 20
> > 30
> > 40
> > 60
> > 70
>
> > (notice missing 50)
>
> > At the same time, Search server 2 may index:
>
> > 20
> > 30
> > 40
> > 50
>
> > (notice hasn't indexed 60 or 70 yet)
>
> > I send a query and get a response from Server 1 and get a since_id of
> > 70.  On my next request I use that since_id=70 and I'll never see
> > tweet 50.  Thus the "missing tweets".
>
> > This is quite annoying, especially now that users are noticing and
> > complaining to us (the app devs) that are apps are broken.
>
> > I cannot think of a good work around for this that would be simple
> > enough to implement and be worth the effort.
>
> > Is this behavior something anyone else can confirm? Are tweets
> > supposed to be indexed/replicated serially by the search servers?
>
> > -Chad

[twitter-dev] Re: Search API: since_id is now unreliable

Reply via email to