I am using python to implement a process which listens to the stream
and places all incoming data onto a message queue service.  A few
other worker processes in the background work off the queue and store
the data.  The message queue is not fault tollerant at this time,
however with a simple switch to an enterprise based MQ service that
could be achieved.

You are essentially doing the same thing via some bash scripts and
flatfiles.  How are you parsing and indexing the data once its
collected?

On May 25, 5:02 pm, "Brendan O'Connor" <breno...@gmail.com> wrote:
> spritzer is great!  well done folks.
> I'm wondering how other people are collecting the data.  I'm saving the
> json-per-line raw output to a flatfile, just using a restarting curl, then
> processing later.
>
> Something as simple as this seems to work for me:
>
> while true; do
>   date; echo "starting curl"
>   curl -s -u user:passhttp://stream.twitter.com/spritzer.json>>
> tweets.$(date --iso)
>   sleep 1
> done |& tee curl.log
>
> ... and also, to force file rotation once in a while:
>
> while true; do
>   date; echo "forcing curl restart"
>   killall curl
>   sleep $((60*60*5))
> done |& tee kill.log
>
> anyone else?
>
> -Brendan

Reply via email to