Matt - That took care of it... minor change on my side with big resource savings. Where was the original announcement made that this had changed (wondering how I missed it).
Thanks! Brian On May 15, 8:33 am, Matt Sanford <m...@twitter.com> wrote: > Hi Brian, > > My guess is that this is the same since_id/max_id pagination > confusion we have always had. If you look at the next_page URL in our > API you'll notice that it does not contain the since_id. If you are > searching with since_id and requesting multiple pages you need to > manually stop pagination once you find an id lower than your original > since_id. I know this is a pain but there is a large performance gain > in it on our back end. There was an update a few weeks ago [1] where I > talked about this and a warning message (twitter:warning in atom and > "warning" in JSON) was added to alert you to the fact it had been > removed. Does that sound like the cause of your issue? > > Thanks; > – Matt Sanford / @mzsanford > Twitter Dev > > [1] -http://groups.google.com/group/twitter-development-talk/browse_frm/th... > > On May 15, 2009, at 7:50 AM, briantroy wrote: > > > > > I've noticed this before but always tried to deal with it as a bug on > > my side. It is, however, now clear to me that from time to time > > Twitter Search API seems to ignore the since_id. > > > We track FollowFriday by polling Twitter Search every so often (the > > process is throttled from 10 seconds to 180 seconds depending on how > > many results we get). This works great 90% of the time. But on high > > volume days (Fridays) I've noticed we get a lot of multi-page > > responses causing us to make far too many requests to the Twitter API > > (900/hour). > > When attempting to figure out why we are making so many requests I > > uncovered something very interesting. When we get a "tweet" we store > > it in our database. That database has a unique index on the customer > > id/Tweet Id. When we get mulit-page responses from Twitter and iterate > > through each page the VAST MAJORITY of the Tweets violate this unique > > index. What does this mean? That we already have that tweet. > > Today, I turned on some additional debugging and saw that the tweets > > we were getting from Twitter Search were, in fact, prior to the > > since_id we sent. > > > This is causing us to POUND the API servers unnecessarily. There is, > > however, really nothing I can do about it on my end. > > > Here is a snip of the log showing the failed inserts and the ID we are > > working with. The last line shows you both the old max id and the new > > max id (after processing the tweets). As you can see every tweet > > violates the unique constraint (27 is the customer id). You can also > > see that we've called the API for this one search 1016 times this > > hour... which is WAY, WAY too much (16.9 times per second): > > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522797' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('#<b>followfriday</b> edubloggers > > @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo > > @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http:// > > s3.amazonaws.com/twitter_production/profile_images/52716611/ > > Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +0000', 27) > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522766' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('thx for the #<b>followfriday</b> > > love, @brokesocialite & @silveroaklimo. Also thx to @diamondemory > > & @bmichelle for the RTs of FF',1806522766, 0, '', 1149953, > > 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/ > > profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009 > > 14:41:51 +0000', 27) > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522760' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #<b>followfriday</b> > > @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay > > @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322, > > 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/ > > profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51 > > +0000', 27) > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522759' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('Morning my tweets!!! <b>follow > > friday</b>! Dnt forget to RT me in need of followers LOL!',1806522759, > > 0, '', 11790458, 'Dae_Marie', 'en', 'http://s3.amazonaws.com/ > > twitter_production/profile_images/199283178/dae_babyyyy_normal.jpg', > > 'Fri, 15 May 2009 14:41:50 +0000', 27) > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522752' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('<b>#ff</b> #<b>followfriday</b> > > @dirtyert (he\'s started with scrap metal stories) and @soufron if you > > speak French',1806522752, 0, '', 1704, 'vagredajr', 'en', 'http:// > > s3.amazonaws.com/twitter_production/profile_images/155241633/ > > _agreda_normal.jpg', 'Fri, 15 May 2009 14:41:50 +0000', 27) > > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate > > entry '27-1806522729' for key 2 > > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, > > from_user_id, from_user, iso_language_code, profile_image_url, > > created_at, bulk_svc_id) values('#<b>followfriday</b> @hootsuite > > @FitnessMagazine @packagingdiva @MobileLifeToday',1806522729, 0, '', > > 11893419, 'ServiceFoods', 'it', 'http://s3.amazonaws.com/ > > twitter_production/profile_images/141678280/SF-shrunken_normal.bmp', > > 'Fri, 15 May 2009 14:41:50 +0000', 27) > > Updating number for api hits for hour: 10 to: 1016 > > DEBUG: 10:45:37 AM on Fri May 15th Checking for next page... **? > > page=11&max_id=1806554381&rpp=100&q=followfriday+OR+%22follow+friday > > %22+OR+%23ff+OR+fastfollowfive** > > DEBUG: 10:45:37 AM on Fri May 15th There is another page for this > > search... doing the next page now... > > DEBUG: 10:45:37 AM on Fri May 15th Old max: 1806554381 New max: > > 1806554381 > > > I'd love to help you track this down... I think it has something to do > > with high volume search queries (and perhaps the API server not all > > having the same index of the tweets at the same time). Regardless - > > this will cause un-due load on the API servers... and serves NO > > purpose for me... > > > Brian Roy > > > President and Founder - justSignal