I am creating a mathematical model based on some results from Twitter's API, but I am missing one critical number in the model. I need to estimate the number of total tweets in the USA each day. The better an estimate I get and the less assumptions I make, the more useful the model will be (it will be published for the public to use). I have been told that this type of information is important and usually kept secret by internet start ups. Understanding this, I have come up with a work around that is not yet accurate enough so I am looking for your advice.
Idea: I gather data from Twitter's search API at least once an hour. My idea is to store the first tweet ID I see each day, and subtract it from the ID of the previous day to estimate the number of tweets per day. I have three problems here: 1. How are tweet IDs incremented? Do they increase by a factor of 1, 2, 5, 10...? 2. I need an estimate for the number of private/protected users assuming each private user's tweet gets an ID number. This is required because I am sampling the public tweets. 3. I need to estimate the number of tweets coming from overseas. I am modeling the USA. This is less of a problem than the previous two. Thanks for your time. Any help/advice is appreciated!