Greetings all,

I've made a bit of progress, and have found a fairly simple to use
program that is collecting tweets for me at a modest rate. I have
maybe 10-20 MB after about an hour or two, so I think Tom's estimates
are on target, if not a bit optimistic. :) But, what I am using is a
very simple to run php program called phirehose. This is pretty nice,
and I found it to be about the easiest to deal with of all the other
tools I encountered for simply copying bunches of tweets.
http://code.google.com/p/phirehose/ That said if there are other easy
to use tools out there designed to just slurp off the firehose I'd be
interested (by easy to use I mean you install them and submit a
command and it starts spitting out tweets without needed to do much
more than enter a userid and password).

That said, I think Twitter could probably reduce the load on some of
their streams by making a few collections available for download, so
folks like me don't just slurp away trying to build up a moderate to
large size collection of tweets for (in this case) class assignment
use. I suspect there might be a fair number of users out there who
would just like some data to mess around with, without having to buy
it from the authorized reseller or go through the effort of figuring
out the API on their own and collecting the data themselves. That
said, it turns out collecting the data myself isn't too awful, but I
have the time and the inclination to do that....

In any case, I'm still interested in other sources of data like this,
or tools to collect it....

On my way to 100 GB. :)

Thanks,
Ted

On Sun, Mar 6, 2011 at 1:16 PM, Ted Pedersen <duluth...@gmail.com> wrote:
> I'd like to get somewhere around 100GB of tweets. It doesn't matter
> where they are from, when they were sent, etc. I'd just like to have a
> relatively large collection of data to use as assignment data for a
> class I'm teaching that uses Hadoop.
>
> Is such a collection available for download anywhere, or is there an
> existing program I could use to simply record twitter data for some
> period of time? (I've heard about both the firehose and the streaming
> API, but can't seem to find anything that is ready to run with that
> for this particular task....but I might not know where to look).
>
> Cordially,
> Ted
>
> ---
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group: 
> http://groups.google.com/group/twitter-development-talk
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to