Thanks for the replies Doug. btw, as far as I can tell the ~ 52M integer values for user ids aren't fully sequential, with some amount missing in between, which I'm assuming is intentionally done to make it slightly more difficult to know the exact user growth trends/patterns (seems valid to me) or for some other system reason. There's definitely some gaps in there FYI.
dave On Jun 29, 2:04 pm, Doug Williams <d...@twitter.com> wrote: > Please do not scrape our site. We have processes in place that will > automatically block your spiders. > > If you feel that you have a compelling need for vast amounts of data please > email the API team [1] with a detailed description of your needs and the > value you hope to create and let's have a conversation. > > 1.http://apiwiki.twitter.com/Support > > Thanks, > Doug > > > > On Mon, Jun 29, 2009 at 10:30 AM, Scott Haneda <talkli...@newgeo.com> wrote: > > I dint think this is a function of a workaround. This is a function of > > Twitter having a good policy in place to prevent abuse. > > > You can do what you want by incrementally querrying the API. The API limits > > will make it take too long. Even with multiple accounts it will be months > > before you get a final list. Even then, I'm not sure you could keep on top > > of new user registrations. > > > Having acces to this data could only be used for nefarious efforts. What > > you want would be a spammers dream. > > > I think you would be better and faster to build a crawl farm and crawl all > > links on Twitter.com and parse the users out, bypassing the API. > > > Even with the API, as you add new records, those records you just added > > will expire, delete, get banned, blocked etc. There is no way you could ever > > have a reconciled system. > > > Consider if each username is an average 10 bytes. You have 520,000,000 > > bytes to download of just username data. Let's double that for http overhead > > and other misc data that will come over the wire. 1 billion bytes. > > > That's a strongly conservative terrabyte of data that you would have to > > download once a day and reconcile against the previous day. A terrabyte of > > just usernames. > > > Then you have all the CPU that you will need, network lag, time to insert > > into your data source. > > > This is not something that can be worked around. This is simply a > > limitation of scale, one that can not be overcome. You need a direct link to > > twitters data sources, ideally from within their data center to reduce > > network lag. This probably will not be approved :) > > -- > > Scott > > Iphone says hello. > > > On Jun 29, 2009, at 9:06 AM, Arunachalam <arunachala...@gmail.com> wrote: > > > Even if i have my account whitelisted, which have 20,000 request per hour, > > i need to run for many days which is not feasible. Any other workaround. > > > Any other way to get rid of these request limit. > > > Cheers, > > Arunachalam > > > On Mon, Jun 29, 2009 at 7:01 PM, Abraham Williams < <4bra...@gmail.com> > > 4bra...@gmail.com> wrote: > > >> There has been over 52000000 profiles created. You could just start at > >> 1 and count up. Might take you a while though. > > >> Abraham > > >> On Mon, Jun 29, 2009 at 07:55, Arunachalam< <arunachala...@gmail.com> > >> arunachala...@gmail.com> wrote: > >> > Any idea how to implement the same using php / any other language. > >> > Im confused abt the implementation. > > >> > Cheers, > >> > Arunachalam > > >> > On Mon, Jun 29, 2009 at 5:57 PM, Cameron Kaiser <<spec...@floodgap.com> > >> spec...@floodgap.com> > >> > wrote: > > >> >> > I am looking to find the entire twitter user list ids. > > >> >> > Social graph method provides the way to fetch the friends and > >> followers > >> >> > id, > >> >> > thorough which we can access the profile of the person using user > >> method > >> >> > - > >> >> > show. But this requires a code to be written to recursively crawl the > >> >> > list > >> >> > from any starting id and appending the followers and friends id of > >> the > >> >> > person without duplicating. > > >> >> > Do we have any other API to get entire list. If not, any other ways > >> >> > apart > >> >> > from crawling to get the entire list. > > >> >> No, and no, there are no other ways. > > >> >> -- > >> >> ------------------------------------ personal: > >> >> <http://www.cameronkaiser.com/>http://www.cameronkaiser.com/-- > >> >> Cameron Kaiser * Floodgap Systems * <http://www.floodgap.com> > >>www.floodgap.com* > >> >> <ckai...@floodgap.com>ckai...@floodgap.com > >> >> -- Careful with that Axe, Eugene. -- Pink Floyd > >> >> ------------------------------- > > >> -- > >> Abraham Williams | Community Evangelist | <http://web608.org> > >>http://web608.org > >> Hacker | <http://abrah.am>http://abrah.am| <http://twitter.com/abraham> > >>http://twitter.com/abraham > >> Project | <http://fireeagle.labs.poseurtech.com> > >>http://fireeagle.labs.poseurtech.com > >> This email is: [ ] blogable [x] ask first [ ] private.