Toby,

This is indeed a problem. The ultimate solution, irrespective of
practicality, is to arrange to take the entire firehose.

Your connections are indeed limited by IP address -- additional accounts,
past two, will not help much in this use case.

You can, if so motivated, average your new connections over a larger period,
allowing lower typical latency. Currently a 10 minute window is configured
for IP limiting, but we may change that period without notice. You can
connect quite a few times in a 10 minute window before getting limited by IP
-- again, subject to change. If you allow, say, 20 connections in 10
minutes, at any velocity, you should stop accepting new predicates, or only
update once every 30 seconds until the 10 minute window rolls over. This
should allow good liveness for many modest arrival rates and temporal
arrival probability distributions.

We really need to support updating predicates on live streams to make this
use case generally practical, short of taking the firehose.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.



On Tue, Apr 6, 2010 at 11:21 AM, Toby Phipps <tphi...@gmail.com> wrote:

> Hi,
>
> I've been reading a lot in the Twitter Streaming API doc and in this
> group about techniques to handle filter updates. I've got a good
> picture of the best practices, but having a hard time applying them to
> my particular situation.
>
> In my case, I've got a filtered stream where the filters will be
> updated based on the current user activities on my site. The filter
> updates won't happen that frequently, but when they do, they have to
> happen with as little latency as possible. This isn't a big problem in
> probably 80% of the cases where the last filter update happened more
> than 2 minutes ago, as I can happily disconnect and reconnect
> immediately and stay within the rules.
>
> However when multiple filter updates happen to arrive within the 2
> minutes, that's where I have an issue. The unlucky user whose request
> came in just after a previous update happened gets stuck waiting the
> full 2 minutes before anything happens for them. They'll get bored,
> and walk away!
>
> The approaches to filter updates in the doc and in this group mainly
> talk about two concurrent streams - one primary stream with an
> elevated role and a second interim stream with default elevation.
> However, this approach works well in allowing filter changes with
> minimal interruption to the high-volume stream, but it does little or
> nothing to reduce the update latency. The worst case update latency is
> still 2 minutes for the poor sucker who came in just after a reconnect
> on the second (default elevation) stream.
>
> Some of the ideas I'm considering are:
>
> 1. Running four concurrent streams under four different Twitter
> accounts and spreading the overall filter criteria between them all
> (without predicate overlap to prevent wastage). I round-robin any
> filter changes across the streams, so I should be able to average 4x
> less latency. This seems within the rules since I'm using four
> different accounts, but I'm concerned that unless I originate from
> four different IPs that it'll be seen as a grey area and I risk being
> banned.
>
> 2. Bending the rules a little and bringing my minimum time before
> reconnect down to 30 seconds, hoping that if 80% or more of the time I
> respect the 2 minute minimum reconnect interval (and actually stay
> connected a LOT longer in most cases), I can get away with
> reconnecting a little more often during edge cases.
>
> 3. Running a single stream, and when filter changes are needed and I'm
> still within the 2 minute reconnect window, faking a stream with
> multiple queries until the reconnect is allowable at which time I
> transition to the reconnected stream. While this might be strictly
> within the rules, I'm convinced that the multiple query hits while
> waiting for the reconnect window to open would have a higher impact on
> Twitter than an extra reconnect within the 2 minute window every now
> and then.
>
> Can anyone shed some light on which of these approaches is preferable,
> or propose a different/better one? The goal for me is being able to
> adapt the stream criteria to my current user load with the change
> taking effect as quickly as possible - I can probably wait 30 seconds
> for an update, but 2 minutes will be tough!
>
> Thanks,
> Toby.
>


-- 
To unsubscribe, reply using "remove me" as the subject.

Reply via email to