Also, Amazon offers free public data sets at:
http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1
On Tue, Jan 4, 2011 at 7:28 PM, Lance Norskog wrote:
> https://cwiki.apache.org/confluence/display/MAHOUT/Collections
>
> All the collections you can imagine.
>
> On Tue, Jan 4, 2011 at
How about http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1 ?
Just the first one (WestburyLab USENET corpus) is 40GB. I suspect you can
find different formats and data sizes there.
Dave Viner
On Mon, Jan 3, 2011 at 11:31 PM, Adarsh Sharma wrote:
> Dear all,
>
>
Hi Sudhir,
Can you publish your findings around pricing, and how you calculated the
various aspects?
This is great information.
Thanks
Dave Viner
On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu <
sudhir.vallamko...@icrossing.com> wrote:
> We recently crossed this bridge and
, and look
for your distributed cache file in hdfs.
Would that work?
Dave Viner
On Sat, Oct 9, 2010 at 1:21 PM, Steve Lewis wrote:
> For development purposes I need to run some code in a mapper and / or
> reducer ( imagine I am trying to verify that files in distributed cache are
>