Ken,
You can also take a look at the FAQ section in the posts we publish
periodically. It started with
http://blog.sematext.com/2010/02/16/hadoop-digest-february-2010/. The
frequently asked questions are mainly retrieved from the project's user
mailing lists.
We also cover HBase (you can find
Hi all,
I recently hosted an Intro to Hadoop session at the BigDataCamp
unconference last week. I later wrote down questions from the audience
that seemed useful to other Hadoop beginners, and the compared this to
the Hadoop project FAQ at http://wiki.apache.org/hadoop/FAQ
There was
Cool, Ken, thank you, I think it is very useful.
Mark
On Thu, Jul 8, 2010 at 4:35 PM, Ken Krugler kkrugler_li...@transpac.comwrote:
Hi all,
I recently hosted an Intro to Hadoop session at the BigDataCamp
unconference last week. I later wrote down questions from the audience that
seemed
in this situation?
thanks for your help
Tom
--
View this message in context:
http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
?
Tom
On 05/05/2010 00:12, Ravi Phulari wrote:
Re: new to hadoop You can configure (conf/hadoop-env.sh) configuration files on
each node to specify -Xmx values.
You can use conf/mapred-site.xml to configure default mappers and reducers
running on a node.
property
namemapred.map.tasks/name
task can run on a machine. or
what is the
optimal setting in this situation?
thanks for your help
Tom
--
View this message in context:
http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com
be the optimal setting for
mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine?
Tom
On 05/05/2010 00:12, Ravi Phulari wrote:
Re: new to hadoop You can configure (conf/hadoop-env.sh)
configuration files on each node to specify --Xmx values.
You
Kevin Sweeney wrote:
I really appreciate everyone's input. We've been going back and forth on the
server size issue here. There are a few reasons we shot for the $1k price,
one because we wanted to be able to compare our datacenter costs vs. the
cloud costs. Another is that we have spec'd out a
I have a question that i feel i should ask on this thread. Lets say you
want to build a cluster where you will be doing very little map/reduce,
storage and replication of data only on hdfs. What would the hardware
requirements be? No quad core? less ram?
Thanks
-Ryan
On Thu, Oct 1, 2009 at
Ryan Smith wrote:
I have a question that i feel i should ask on this thread. Lets say you
want to build a cluster where you will be doing very little map/reduce,
storage and replication of data only on hdfs. What would the hardware
requirements be? No quad core? less ram?
Servers with more
On Oct 1, 2009, at 7:13 AM, Steve Loughran wrote:
Ryan Smith wrote:
I have a question that i feel i should ask on this thread. Lets
say you
want to build a cluster where you will be doing very little map/
reduce,
storage and replication of data only on hdfs. What would the
hardware
I wouldn't spec the worker nodes just to facilitate cloud cost comparison.
There's enough variability out there and you'd have to deal with storage,
network bandwidth and I/O. Not to mention a similarly spec'd virtual cloud
server will never perform as well as a physical server because you don't
Todd Lipcon wrote:
Most people building new clusters at this point seem to be leaning towards
dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM.
We went with a similar configuration for a recently purchased cluster
but opted for qual quad core Opterons (Shanghai) rather than
We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might be
overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16
virtual cores so 12GB might not have been enough. These boxes are around $4k
each, but can easily outperform any $1K box dollar per dollar (and
2TB drives are just now dropping to parity with 1TB on a $/GB basis.
If you want space rather than speed, this is a good option. If you want
speed rather than space, more spindles and smaller disks are better.
Ironically, 500GB drives now often cost more than 1TB drives (that is $, not
$/GB).
Depending on your needs and the size of your cluster, the out-of-band
management can be of significant interest. It is a pretty simple
cost/benefit analysis that trades your sysops time (which is probably about
the equivalent of $50-150 per hour fully loaded and accounting for
opportunity cost)
Hey all,
I'm pretty new to hadoop in general and I've been tasked with building out a
datacenter cluster of hadoop servers to process logfiles. We currently use
Amazon but our heavy usage is starting to justify running our own servers.
I'm aiming for less than $1k per box, and of course trying
machines than
3x as many $1k machines, assuming you can afford at least 4-5 of them.
-Todd
On Tue, Sep 29, 2009 at 10:57 AM, ylx_admin nek...@hotmail.com wrote:
Hey all,
I'm pretty new to hadoop in general and I've been tasked with building out
a
datacenter cluster of hadoop servers to process
as many $1k machines, assuming you can afford at least 4-5 of them.
-Todd
On Tue, Sep 29, 2009 at 10:57 AM, ylx_admin nek...@hotmail.com wrote:
Hey all,
I'm pretty new to hadoop in general and I've been tasked with building
out
a
datacenter cluster of hadoop servers to process
19 matches
Mail list logo