Re: FAQ for New to Hadoop

2010-07-11 Thread Alex Baranau
Ken, You can also take a look at the FAQ section in the posts we publish periodically. It started with http://blog.sematext.com/2010/02/16/hadoop-digest-february-2010/. The frequently asked questions are mainly retrieved from the project's user mailing lists. We also cover HBase (you can find

FAQ for New to Hadoop

2010-07-08 Thread Ken Krugler
Hi all, I recently hosted an Intro to Hadoop session at the BigDataCamp unconference last week. I later wrote down questions from the audience that seemed useful to other Hadoop beginners, and the compared this to the Hadoop project FAQ at http://wiki.apache.org/hadoop/FAQ There was

Re: FAQ for New to Hadoop

2010-07-08 Thread Mark Kerzner
Cool, Ken, thank you, I think it is very useful. Mark On Thu, Jul 8, 2010 at 4:35 PM, Ken Krugler kkrugler_li...@transpac.comwrote: Hi all, I recently hosted an Intro to Hadoop session at the BigDataCamp unconference last week. I later wrote down questions from the audience that seemed

new to hadoop

2010-05-04 Thread jamborta
in this situation? thanks for your help Tom -- View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: new to hadoop

2010-05-04 Thread Ravi Phulari
? Tom On 05/05/2010 00:12, Ravi Phulari wrote: Re: new to hadoop You can configure (conf/hadoop-env.sh) configuration files on each node to specify -Xmx values. You can use conf/mapred-site.xml to configure default mappers and reducers running on a node. property namemapred.map.tasks/name

Re: new to hadoop

2010-05-04 Thread Tamas Jambor
task can run on a machine. or what is the optimal setting in this situation? thanks for your help Tom -- View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html Sent from the Hadoop core-user mailing list archive at Nabble.com

Re: new to hadoop

2010-05-04 Thread Tamas Jambor
be the optimal setting for mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine? Tom On 05/05/2010 00:12, Ravi Phulari wrote: Re: new to hadoop You can configure (conf/hadoop-env.sh) configuration files on each node to specify --Xmx values. You

Re: Advice on new Datacenter Hadoop Cluster?

2009-10-01 Thread Steve Loughran
Kevin Sweeney wrote: I really appreciate everyone's input. We've been going back and forth on the server size issue here. There are a few reasons we shot for the $1k price, one because we wanted to be able to compare our datacenter costs vs. the cloud costs. Another is that we have spec'd out a

Re: Advice on new Datacenter Hadoop Cluster?

2009-10-01 Thread Ryan Smith
I have a question that i feel i should ask on this thread. Lets say you want to build a cluster where you will be doing very little map/reduce, storage and replication of data only on hdfs. What would the hardware requirements be? No quad core? less ram? Thanks -Ryan On Thu, Oct 1, 2009 at

Re: Advice on new Datacenter Hadoop Cluster?

2009-10-01 Thread Steve Loughran
Ryan Smith wrote: I have a question that i feel i should ask on this thread. Lets say you want to build a cluster where you will be doing very little map/reduce, storage and replication of data only on hdfs. What would the hardware requirements be? No quad core? less ram? Servers with more

Re: Advice on new Datacenter Hadoop Cluster?

2009-10-01 Thread Brian Bockelman
On Oct 1, 2009, at 7:13 AM, Steve Loughran wrote: Ryan Smith wrote: I have a question that i feel i should ask on this thread. Lets say you want to build a cluster where you will be doing very little map/ reduce, storage and replication of data only on hdfs. What would the hardware

Re: Advice on new Datacenter Hadoop Cluster?

2009-10-01 Thread Patrick Angeles
I wouldn't spec the worker nodes just to facilitate cloud cost comparison. There's enough variability out there and you'd have to deal with storage, network bandwidth and I/O. Not to mention a similarly spec'd virtual cloud server will never perform as well as a physical server because you don't

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread stephen mulcahy
Todd Lipcon wrote: Most people building new clusters at this point seem to be leaning towards dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM. We went with a similar configuration for a recently purchased cluster but opted for qual quad core Opterons (Shanghai) rather than

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Patrick Angeles
We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might be overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16 virtual cores so 12GB might not have been enough. These boxes are around $4k each, but can easily outperform any $1K box dollar per dollar (and

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
2TB drives are just now dropping to parity with 1TB on a $/GB basis. If you want space rather than speed, this is a good option. If you want speed rather than space, more spindles and smaller disks are better. Ironically, 500GB drives now often cost more than 1TB drives (that is $, not $/GB).

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
Depending on your needs and the size of your cluster, the out-of-band management can be of significant interest. It is a pretty simple cost/benefit analysis that trades your sysops time (which is probably about the equivalent of $50-150 per hour fully loaded and accounting for opportunity cost)

Advice on new Datacenter Hadoop Cluster?

2009-09-29 Thread ylx_admin
Hey all, I'm pretty new to hadoop in general and I've been tasked with building out a datacenter cluster of hadoop servers to process logfiles. We currently use Amazon but our heavy usage is starting to justify running our own servers. I'm aiming for less than $1k per box, and of course trying

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-29 Thread Todd Lipcon
machines than 3x as many $1k machines, assuming you can afford at least 4-5 of them. -Todd On Tue, Sep 29, 2009 at 10:57 AM, ylx_admin nek...@hotmail.com wrote: Hey all, I'm pretty new to hadoop in general and I've been tasked with building out a datacenter cluster of hadoop servers to process

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-29 Thread Amandeep Khurana
as many $1k machines, assuming you can afford at least 4-5 of them. -Todd On Tue, Sep 29, 2009 at 10:57 AM, ylx_admin nek...@hotmail.com wrote: Hey all, I'm pretty new to hadoop in general and I've been tasked with building out a datacenter cluster of hadoop servers to process