Hi Nick, I am linguist currently working on Twitter. I would be very interested in using the corpus that you mention you have created.
I work in the area of Systemic Functional Linguistics and am looking at how people use language to affiliate on Twitter. At the moment I am working with a corpus of approx 45000 tweets (so rather small). many thanks, Michele On Apr 10, 2:22 am, Nick Arnett <nick.arn...@gmail.com> wrote: > On Thu, Apr 9, 2009 at 7:13 AM, kanny <fruhl...@coolgoose.com> wrote: > > > Caching is something i will definitely be doing, but as i said, to do > > something complex like semantic model generation, i need access to a > > user's last, at least 100,000 friends_timeline tweets. For a typical > > user following 100 reasonably active persons, this would take 2-3 > > months to build, which is not practical to wait for the application to > > be usable. > > I have about 2.3 million cached statuses for more than 10,000 users, > gathered over the last couple of months for the analysis I do for TwURLed > News (http://TwURLedNews.com). There's a sampling bias in favor of people > who have tended to cite URLs that became popular. > > I'm quite interested in the kind of analysis you're doing, so I'd be happy > to share the data with you or anyone else who might be want it for this sort > of purpose. It wouldn't be hard for me to export it in the format you want > and make it available for download, though if a lot of people want it, that > would become a problem... but then we can figure out somewhere other than my > servers to put it on. > > So... would this be useful as a one-time offer? Do you intend to share the > results of your analysis? > > Nick