Hi hyqgod, This is probably a better question for the spark user's list than the dev list (cc'ing user and bcc'ing dev on this reply).
To answer your question, though: Amazon's Public Datasets Page is a nice place to start: http://aws.amazon.com/datasets/ - these work well with spark because they're often stored on s3 (which spark can read from natively) and it's very easy to spin up a spark cluster on EC2 to begin experimenting with the data. There's also a pretty good list of (mostly big) datasets that google has released over the years here: http://svonava.com/post/62186512058/datasets-released-by-google - Evan On Tue, Feb 25, 2014 at 6:33 PM, 黄远强 <hyq...@163.com> wrote: > Hi all: > I am a freshman in Spark community. i dream of being a expert in the field > of big data. But i have no idea where to start after i have gone through > the published documents in Spark website and examples in Spark source > code. I want to know if there are some public data set in the internet > that can be utilized to learn Spark and test my some new ideas base on > Spark. > Thanks a lot. > > > --------------------------- > Best regards > hyqgod