Hi Brian,

My guess is that this is the same since_id/max_id pagination confusion we have always had. If you look at the next_page URL in our API you'll notice that it does not contain the since_id. If you are searching with since_id and requesting multiple pages you need to manually stop pagination once you find an id lower than your original since_id. I know this is a pain but there is a large performance gain in it on our back end. There was an update a few weeks ago [1] where I talked about this and a warning message (twitter:warning in atom and "warning" in JSON) was added to alert you to the fact it had been removed. Does that sound like the cause of your issue?

Thanks;
 – Matt Sanford / @mzsanford
     Twitter Dev

[1] - 
http://groups.google.com/group/twitter-development-talk/browse_frm/thread/6e80cb6eec3a16d3?tvc=1

On May 15, 2009, at 7:50 AM, briantroy wrote:


I've noticed this before but always tried to deal with it as a bug on
my side. It is, however, now clear to me that from time to time
Twitter Search API seems to ignore the since_id.

We track FollowFriday by polling Twitter Search every so often (the
process is throttled from 10 seconds to 180 seconds depending on how
many results we get). This works great 90% of the time. But on high
volume days (Fridays) I've noticed we get a lot of multi-page
responses causing us to make far too many requests to the Twitter API
(900/hour).
When attempting to figure out why we are making so many requests I
uncovered something very interesting. When we get a "tweet" we store
it in our database. That database has a unique index on the customer
id/Tweet Id. When we get mulit-page responses from Twitter and iterate
through each page the VAST MAJORITY of the Tweets violate this unique
index. What does this mean? That we already have that tweet.
Today, I turned on some additional debugging and saw that the tweets
we were getting from Twitter Search were, in fact, prior to the
since_id we sent.

This is causing us to POUND the API servers unnecessarily. There is,
however, really nothing I can do about it on my end.

Here is a snip of the log showing the failed inserts and the ID we are
working with. The last line shows you both the old max id and the new
max id (after processing the tweets). As you can see every tweet
violates the unique constraint (27 is the customer id). You can also
see that we've called the API for this one search 1016 times this
hour... which is WAY, WAY too much (16.9 times per second):

NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522797' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('#<b>followfriday</b> edubloggers
@CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo
@suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http://
s3.amazonaws.com/twitter_production/profile_images/52716611/
Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +0000', 27)
NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522766' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('thx for the #<b>followfriday</b>
love, @brokesocialite &amp; @silveroaklimo.  Also thx to @diamondemory
&amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953,
'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/
profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009
14:41:51 +0000', 27)
NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522760' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #<b>followfriday</b>
@ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay
@Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322,
'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/
profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51
+0000', 27)
NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522759' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('Morning my tweets!!! <b>follow
friday</b>! Dnt forget to RT me in need of followers LOL!',1806522759,
0, '', 11790458, 'Dae_Marie', 'en', 'http://s3.amazonaws.com/
twitter_production/profile_images/199283178/dae_babyyyy_normal.jpg',
'Fri, 15 May 2009 14:41:50 +0000', 27)
NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522752' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('<b>#ff</b> #<b>followfriday</b>
@dirtyert (he\'s started with scrap metal stories) and @soufron if you
speak French',1806522752, 0, '', 1704, 'vagredajr', 'en', 'http://
s3.amazonaws.com/twitter_production/profile_images/155241633/
_agreda_normal.jpg', 'Fri, 15 May 2009 14:41:50 +0000', 27)
NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
entry '27-1806522729' for key 2
SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
from_user_id, from_user, iso_language_code, profile_image_url,
created_at, bulk_svc_id) values('#<b>followfriday</b> @hootsuite
@FitnessMagazine @packagingdiva @MobileLifeToday',1806522729, 0, '',
11893419, 'ServiceFoods', 'it', 'http://s3.amazonaws.com/
twitter_production/profile_images/141678280/SF-shrunken_normal.bmp',
'Fri, 15 May 2009 14:41:50 +0000', 27)
Updating number for api hits for hour: 10 to: 1016
DEBUG: 10:45:37 AM on Fri May 15th Checking for next page... **?
page=11&max_id=1806554381&rpp=100&q=followfriday+OR+%22follow+friday
%22+OR+%23ff+OR+fastfollowfive**
DEBUG: 10:45:37 AM on Fri May 15th There is another page for this
search... doing the next page now...
DEBUG: 10:45:37 AM on Fri May 15th Old max: 1806554381 New max:
1806554381


I'd love to help you track this down... I think it has something to do
with high volume search queries (and perhaps the API server not all
having the same index of the tweets at the same time). Regardless -
this will cause un-due load on the API servers... and serves NO
purpose for me...

Brian Roy

President and Founder - justSignal

Reply via email to