[twitter-dev] Re: Tweet Corpus creation for NLP research

Michele Zappavigna Thu, 16 Apr 2009 19:43:20 -0700

Hi Nick,

I am linguist currently working on Twitter. I would be very interested
in using the corpus that you mention you have created.


I work in the area of Systemic Functional Linguistics and am looking
at how people use language to affiliate on Twitter. At the moment I am
working with a corpus of approx 45000 tweets (so rather small).

many thanks,
Michele

On Apr 10, 2:22 am, Nick Arnett <nick.arn...@gmail.com> wrote:
> On Thu, Apr 9, 2009 at 7:13 AM, kanny <fruhl...@coolgoose.com> wrote:
>
> > Caching is something i will definitely be doing, but as i said, to do
> > something complex like semantic model generation, i need access to a
> > user's last, at least 100,000 friends_timeline tweets. For a typical
> > user following 100 reasonably active persons, this would take 2-3
> > months to build, which is not practical to wait for the application to
> > be usable.
>
> I have about 2.3 million cached statuses for more than 10,000 users,
> gathered over the last couple of months for the analysis I do for TwURLed
> News (http://TwURLedNews.com).  There's a sampling bias in favor of people
> who have tended to cite URLs that became popular.
>
> I'm quite interested in the kind of analysis you're doing, so I'd be happy
> to share the data with you or anyone else who might be want it for this sort
> of purpose.  It wouldn't be hard for me to export it in the format you want
> and make it available for download, though if a lot of people want it, that
> would become a problem... but then we can figure out somewhere other than my
> servers to put it on.
>
> So... would this be useful as a one-time offer?  Do you intend to share the
> results of your analysis?
>
> Nick

[twitter-dev] Re: Tweet Corpus creation for NLP research

Reply via email to