I am looking for a compute intensive benchmark (cpu usage 60% ) for my
hadoop cluster. If there is something readily available, it would be great.
Thanks,
Matthew
On Tue, May 31, 2011 at 8:30 PM, Cristina Abad cristina.a...@gmail.comwrote:
You could try SWIM [1].
-Cristina
[1] Yanpei
Matthew,
You can use Gridmix for benchmarking Hadoop. You can use Gridmix (v1 v2) to
benchmark the Hadoop cluster. Gridmix (v1 v2) are scripts having MapReduce
jobs with varying load characteristics. You can also modify these scripts to
suit your need. Gridmix is part of Hadoop and can be
Hello Hadoop users.!!!
Well.. I am doing simple hadoop single node installation.. but my datanode
is taking some time to run..
If I go through the namenode logs.. I am getting some strange exception.
2011-06-02 03:59:59,959 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
Hi All,
I'm a bit confused about the values displayed on the 'jobtracker.jsp' page.
In particular, there's a section called 'Cluster Summary'.
I'm running a small 4-machine Hadoop cluster, and when I point a web-browser
at my master machine (http://master:50030/jobtracker.jsp) it
The values show you the maximum heap size and currently used heap of the job
tracker, not running jobs. Furthermore, the HADOOP_HEAPSIZE setting only sets
the maximum heap for the daemons, not the tasks in your job.
If you're getting OOMEs, you should add a setting to your mapred-site.xml file
Apparently the issue is more complicated that I first thought, and it is not
a Nutch issue.
Submitting a MapReduce Job to the JobTracker (through JobClient interface)
the task is executed on another node, with different stdout and stderr from
the ones the job is submitted. Hence, no matter what
Joey,
I just tried it and it worked great. I configured the entire cluster (added
a couple more DataNodes) and I was able to run a simple map/reduce job.
Thanks for your help!
Pony
On Tue, May 31, 2011 at 6:26 PM, gordoslocos gordoslo...@gmail.com wrote:
:D i'll give that a try 1st thing in
btw, just to let you know that I am running my job in a pseudo-distributed mode.
Thanks,
Neeral
From: neeral beladia neeral_bela...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Tue, May 31, 2011 10:00:00 PM
Subject: DistributedCache - getLocalCacheFiles
Newbie on hadoop clusters.
I have setup my two nodes conf as described by M. G. Noll
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
The data node has datanode tasktracker running (jps command shows them),
which means start-dfs.sh and start-mapred.sh
All,
You know the story:
You have data files that are created every 5 minutes.
You have hundreds of servers.
You want to put those files in hadoop.
Eventually:
You get lots of files and blocks.
Your namenode and secondary name node need more memory (BTW JVM's have
issues at large Xmx values).
To find out whether it had any positive performance impact, I am trying with
turning OFF speculative execution. Surprisingly, the job starts to fail in
reduce phase with OOM errors when I disable speculative execution for both
map and reduce tasks. Has anybody noticed similar behavior? Is there a
Check the password less SSH is working or not
Regards,
Jagaran
From: MilleBii mille...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wed, 1 June, 2011 12:28:54 PM
Subject: Adding first datanode isn't working
Newbie on hadoop clusters.
I have setup my two
Thx, already did that
so I can ssh phraseless master to master and master to slave1.
Same as before datanode tasktracker are starting up/shuting down well on
slave1
2011/6/1 jagaran das jagaran_...@yahoo.co.in
Check the password less SSH is working or not
Regards,
Jagaran
OK found my issue. Turned off ufw and it sees the datanode. So I need to fix
my ufw setup.
2011/6/1 MilleBii mille...@gmail.com
Thx, already did that
so I can ssh phraseless master to master and master to slave1.
Same as before datanode tasktracker are starting up/shuting down well on
ufw
From: MilleBii mille...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wed, 1 June, 2011 3:37:23 PM
Subject: Re: Adding first datanode isn't working
OK found my issue. Turned off ufw and it sees the datanode. So I need to fix
my ufw setup.
Some things which helped us include setting your vm.swappiness to 0 and
mounting your disks with noatime,nodiratime options.
Also make sure your disks aren't setup with RAID (JBOD is recommended)
You might want to run terasort as you tweak your environment. It's very
helpful when checking if
It is also worth using dd to verify your raw disk speeds.
Also, expressing disk transfer rates in bytes per second makes it a bit
easier for most of the disk people I know to figure out what is large or
small.
Each of these disks disk should do about 100MB/s when driven well. Hadoop
does OK,
Hi all,
I wanted to use an IO benchmark that reads/writes Data from/into the HDFS
using MapReduce. TestDFSIO. I thought, does this. But what I understand is
that TestDFSIO merely creates the files in a temp folder in the local
filesystem of the TaskTracker nodes. Is this correct? How can such an
Usually the number of speculatively executed tasks is equal to the number of
killed tasks in the UI (as opposed to failed). When Hadoop runs a
speculative task, it ends up killing either the original or the speculative
task, depending on which one finishes first.
I don't think OOM errors would
19 matches
Mail list logo