That wasn't the answer I was hoping for, but thanks for the
guidance. :)

We're working on adding a new process that will use the streaming api
to pull in tweets, then apply a merge operation for phrases and multi-
term searches that produces an output similar to what the search api
provided. There's more work to do on our end, but so far it looks like
it's going to function on the existing hardware. I'll know for sure
once we start pulling in the full term feed.

Looking forward to consuming more tweets with the streaming api in the
near future.  Thanks for the push in the right direction!

Jason (@jmstriegel)



On Jan 28, 10:50 pm, John Kalucki <j...@twitter.com> wrote:
> The track resource on theStreamingAPI is intended for just this sort
> of application. Yes, there will be some over delivery, especially if
> you intend to logically AND low frequency words with high frequency
> words. In the end, this is a minor amount of additional bandwidth and
> processing cost. Processing 1, 10, or 100 per second costs about the
> same. You should be able to do this volume post-processing at your end
> on a single core.
>
> Searchresults will be increasingly filtered and ranked for relevance,
> which sounds like is not the results that you want. Whitelisting won't
> prevent this filtering.
>
> Additional track terms are not supported by opening additional
> connections to theStreamingAPI. Instead, you place more predicates
> on the same stream. The higher access levels support hundreds of
> thousands of predicates. Opening many connections to theStreamingAPI
> will appear like an attempt to circumvent existing rate limits and you
> are likely to be banned from all twitter.com access.
>
> -John Kaluckihttp://twitter.com/jkalucki
> Infrastructure, Twitter Inc.
>
> On Thu, Jan 28, 2010 at 6:15 PM, Jason Striegel
>
>
>
> <jason.strie...@gmail.com> wrote:
> > We started running into rate limiting issues today with one of our
> > applications that uses theSearchAPI (squawq.com).  We're using it to
> > track user-defined queries for a bunch of folks and provide analytics
> > on those searches. It seems like developers are being asked to migrate
> > to theStreamingAPI, but I'm worried it's going to be _way_ less
> > efficient than how we're currently using theSearchAPI.
>
> > Most of the terms we are tracking are relatively low volume and
> > contain complexsearch"AND" type keyword phrases. ex: ["twitter
> > development" OR twitterdev OR "twitter api"]. Most of these are low
> > volume and we can poll a couple times an hour very efficiently.
>
> > The problem is that as we gain more users, the number of these low-
> > volume terms increases. So a second user might be tracking [coke OR
> > "coca cola"], and a third user might track ["first lego league" OR
> > legoleague], and so on. To be able to support this with theStreaming
> > API we would either have to pull a gigantor amount of tweets in
> > through the firehose (assuming we had access) and implement another
> > layer of indexing, or we'd have to set up a stream for eachsearcha
> > user has created, again pulling in way more data than we do currently,
> > but also requiring many concurrent connections and needing to do the
> > join behavior after the fact.
>
> > Long story short, I totally see how thestreamingapi has made things
> > super efficient for a number of applications.  For our Squawq app,
> > however, it seems to be the worst possible scenario: way more
> > bandwidth intensive, requiring more connections to support all the
> > different searches we are running on behalf of our users, and adding a
> > huge amount of processing, storage and software complexity to the
> > process. All for what seemed like a relatively lightweight, low-
> > bandwidth process with thesearchapi.
>
> > Anyone have any ideas for making thestreamingapi work well in this
> > scenario? Can the Twitter team still whitelistsearchapi users that
> > have this sort of need?
>
> > Thanks in advance for any feedback or recommendations.
> > @jmstriegel

Reply via email to