Re: Data for Testing in Hadoop

2011-01-04 Thread Dave Viner
Also, Amazon offers free public data sets at: http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1 On Tue, Jan 4, 2011 at 7:28 PM, Lance Norskog wrote: > https://cwiki.apache.org/confluence/display/MAHOUT/Collections > > All the collections you can imagine. > > On Tue, Jan 4, 2011 at

Re: Data for Testing in Hadoop

2011-01-03 Thread Dave Viner
How about http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1 ? Just the first one (WestburyLab USENET corpus) is 40GB. I suspect you can find different formats and data sizes there. Dave Viner On Mon, Jan 3, 2011 at 11:31 PM, Adarsh Sharma wrote: > Dear all, > >

Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread Dave Viner
Hi Sudhir, Can you publish your findings around pricing, and how you calculated the various aspects? This is great information. Thanks Dave Viner On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu < sudhir.vallamko...@icrossing.com> wrote: > We recently crossed this bridge and

Re: I am looking for a minimal Map-Reduce task

2010-10-09 Thread Dave Viner
, and look for your distributed cache file in hdfs. Would that work? Dave Viner On Sat, Oct 9, 2010 at 1:21 PM, Steve Lewis wrote: > For development purposes I need to run some code in a mapper and / or > reducer ( imagine I am trying to verify that files in distributed cache are >