Myself and my friend are doing a research based on twitter. We need to analyse each and every tweet real time. Can you guide how to approach this.
There could be 2 ways of doing this (without Firehose): 1) Get Twitter Public timeline repeatedly. Thankfully Twitter's caching has not been problem to me, they seem to fetch me new data every request. But there are a lot of limitation for this: According to TweeSpeed.com: - Rate of New tweets in the Twitter Server is right now (Wed Jul 17 11:47:02 - GMT) at 9233 tweets/minute. - Ranges between 7K to 20K on an average Weekday. - On June 26 (MJ's death) - reached 25K tweets/minute. Let us now consider the limitation of API requests per hour. - Currently @ 20K per hour. - 1 Req = 20 Tweets - Need 1K Req per minute = 60K req per hour. To Use 1K Requests per minute, we should be using around 17 requests per second. But my server is able to process only 28-33 requests/ minute. Is this the right way to proceed, or am I fundamentally wrong on the approach. 2) Get follower network - user profiles and get their statuses. Frequency of request their new status updates could be set against their general update frequency. But this is Google-like old way of indexing things, which does not quite stand today in the REAL TIME twitter. I do know Firehose is an option, but that would again be something like Approach 1. right? Please guide me how to proceed.