It's unclear if you're looking for data that can be stored in Cassandra or an 
example of someone using Cassandra to store a network; I'm assuming the former.

You will have a hard time finding a social network dataset with relationships 
already well-defined for free.  I have seen crawls of Twitter before, but IIRC 
they go for thousands (in USD).

Try http://infochimps.org.

There's the Enron email dataset: http://www.cs.cmu.edu/~enron/

The reddit dataset is nice, maybe think beyond explicit connections and use 
voting commonality as links between users?  That dataset seems to meet your 
requirement
of being sufficient to reconstruct a network of users.  You could have "friend" 
edges that are based on voting agreement and "shared interest" edges based on
voting on stories from the same subreddits.

On May 20, 2010, at 1:09 PM, Valerio Schiavoni wrote:

> Not strictly Facebook. 
> Any online social network is ok to me, as long as it has a reasonable number 
> of users and that it's built on top of a schema-less storage system.
> 
> 
> Are you looking for Facebook stuff? Good luck on getting a data set from any 
> real world model.
>  
> 
> Hello everyone,
> i'm a phd student looking for some real-world dataset of any social networks 
> built on top of some schema-less storage system. 
> The dataset should at least provide a mean to reconstruct the graph of users.
> Due to possible sensible informations in the dataset, the dataset can be very 
> possibly anonymized if required, it's not important for my research.
> 
> Someone on #cassandra provided some dataset of reddit votes : 
> http://www.reddit.com/r/redditdev/comments/bubhl/csv_dump_of_reddit_voting_data/.
> This dataset is interesting, but it doesn't provide informations about the 
> graph of users.
> 

Reply via email to