On Dec 15, 9:58 am, John Kalucki <j...@twitter.com> wrote:
> Bandwidth is likely to only be a small fraction of your total cost when
> consuming the firehose. If you want to focus on this small part and ignore
> all the other dominating costs, the prudent systems engineer would provision
> 2x to 3x daily peak to account for traffic spikes, growth, backlog
> retrieval, and to keep latency to a minimum. Not all have such requirements,
> though. So, somewhere between 5 and 15 mbit, very very roughly. Your
> requirements will certainly vary.
>
> The filtered and sampled streams are where virtually everyone will wind up.
>
> -John Kaluckihttp://twitter.com/jkalucki
> Services, Twitter Inc.

I'm using the sampled stream at the moment and it's doing most of what
I need. It's certainly more than enough for development and testing
the algorithms. The filter stream, on the other hand, seems next to
useless to me when compared with the stream coming out of Twitter
search.

For one thing, I do a lot of location-based processing. I'm quite
interested in what's happening in Portland, Oregon, and not so much
about the rest of the world. As far as I can tell, there's no geocode
parameter for "filter". In addition, I can do a search back in time
with Twitter search - with filter, if I don't know what I'm looking
for, it's going to go right by me. ;-)

But really, I'm much more concerned about legal issues with the
firehose than I am with technical issues. There are "resellers" of
firehose data now. They have an advantage over random developers like
myself, because they have a business relationship with Twitter and I
don't. I can't make a credible business plan without knowing what I
will and will not be able to legally do with firehose data, or how
much it will cost me for access.
--
M. Edward (Ed) Borasky
http://borasky-research.net


"I've always regarded nature as the clothing of God." ~Alan Hovhaness

Reply via email to