I don't get that big a discrepancy, but I do get different results from search and streaming. I use streaming for real-time delivery, and then either search or user timelines to backfill missing tweets. As long as the flow makes this possible within rate limits this gets me the greatest number of results, but still not 100%. I accept that 100% ain't gonna happen. You should get within your desired 95% though. That is a realistic goal.
On Tue, Feb 15, 2011 at 6:36 AM, Karussell <[email protected]> wrote: > Hi, > > this problem was already posted to the twitter4j mailing list [1]. Not > sure if it is an issue with my code, twitter4j or an API issue... user > reported similar problems in the past [2]. > > First: > > I'm doing a 100 tweet search (without paging) every 5 minutes e.g. > against 'twitter search'. I get a set of tweets A - excluding the > duplicates, of course. I get approx 5 new tweets for every 5 minutes, > so 100 tweets as pageSize should be perfectly sufficient to get all > tweets. > > Second: > When I'm doing a streaming filter request for the same terms 'twitter > search' then I'm getting a set of tweets B. > > The problem is: combining A and B ('C=A v B') gives me a set C where > the count of C is more than 10% larger then A or B, which means that > neither with search nor streaming API I can catch a nearly complete > set of tweets. > > E.g. doing this for 3 hours I'm getting 254 tweets (A) for the search > and 257 tweets (B) for the streaming but the combined set C has 337 > tweets! > > Is this a bug in my code or could this be an API issue? > > BTW: I don't assume 100% correctness, I only want something above > 90% :) especially for such relatively infrequent terms, where users > can, should and have noticed it. > > Regards, > Peter. > > [1] > http://groups.google.com/group/twitter4j/msg/d959e6257ceb452f > > [2] > http://groups.google.com/group/twitter-development-talk/browse_thread/thread/71ab5cc666113c9e > > http://blog.tweetsmarter.com/twitter-downtime/twitters-dirty-secret-they-dont-show-you-all-tweets/ > > -- > > http://jetwick.com Twitter Search without Noise > > -- > Twitter developer documentation and resources: http://dev.twitter.com/doc > API updates via Twitter: http://twitter.com/twitterapi > Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list > Change your membership to this group: > http://groups.google.com/group/twitter-development-talk > -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
