Thanks for posting this Chad! Doug, please keep us updated on how things progress with this issue so we can pass along guidance to our user-base. Hopefully the improvements will come in the near-term.
Thanks for all that you guys do! Brooks On Jul 21, 3:45 pm, Doug Williams <d...@twitter.com> wrote: > Chad,Your assessment is spot on. > > At the heart of search there are a number of data stores that accept queries > (reads) while at the same time perform writes from an indexer. Heavy load -- > large numbers of queries, large number of writes or both, or both -- can > cause the write replication between the indexer and various data stores to > grow inconsistent when a particular data store is blocked on a read. > > Unfortunately there is no easy fix for this problem at the moment. The > search team has grown considerably in the last couple of weeks so as they > get up to speed, the feature set and stability of search should continue to > improve. > > Thanks, > Doug > > > > On Tue, Jul 21, 2009 at 11:57 AM, Chad Etzel <jazzyc...@gmail.com> wrote: > > > Hi API Team, > > > A few of us have been discussing off list a funky behavior we have > > been noticing and now users are starting to notice. > > > There is a problem for sites/apps like TweetGrid and TweetChat which > > auto-refresh tweets based on the Search API using the since_id. People > > are noticing that these sites are "missing tweets" when compared to > > the search.twitter.com results page for the same query. > > > We believe what is happening is that the search servers are not > > indexing tweets in a serial manner, and so a tweet with a higher id > > may sneak into a search server and be indexed first before a tweet > > with a lower id. This means that when the since_id is sent back from > > the query (or derived from the first result in the results array), > > using that since_id to refresh the query will miss lower id tweets > > when they finally do get indexed. So the illusion of "missing tweets" > > is created. You can run TweetGrid and TweetChat in separate tabs using > > the same query and see that sometimes the results don't match up > > because of this. > > > I'll try to give an example to be clear. > > > Let's say for the sake of simplicity that I'm searching for "twitter" > > and that every 10th tweet in the public timeline matches. So, all > > tweets ending in 0 match my query. > > > Search server 1 may index: > > > 20 > > 30 > > 40 > > 60 > > 70 > > > (notice missing 50) > > > At the same time, Search server 2 may index: > > > 20 > > 30 > > 40 > > 50 > > > (notice hasn't indexed 60 or 70 yet) > > > I send a query and get a response from Server 1 and get a since_id of > > 70. On my next request I use that since_id=70 and I'll never see > > tweet 50. Thus the "missing tweets". > > > This is quite annoying, especially now that users are noticing and > > complaining to us (the app devs) that are apps are broken. > > > I cannot think of a good work around for this that would be simple > > enough to implement and be worth the effort. > > > Is this behavior something anyone else can confirm? Are tweets > > supposed to be indexed/replicated serially by the search servers? > > > -Chad