Re: Benchmarks with different workloads

2011-06-01 Thread Matthew John
I am looking for a compute intensive benchmark (cpu usage 60% ) for my hadoop cluster. If there is something readily available, it would be great. Thanks, Matthew On Tue, May 31, 2011 at 8:30 PM, Cristina Abad cristina.a...@gmail.comwrote: You could try SWIM [1]. -Cristina [1] Yanpei

Re: Benchmarks with different workloads

2011-06-01 Thread Amar Kamat
Matthew, You can use Gridmix for benchmarking Hadoop. You can use Gridmix (v1 v2) to benchmark the Hadoop cluster. Gridmix (v1 v2) are scripts having MapReduce jobs with varying load characteristics. You can also modify these scripts to suit your need. Gridmix is part of Hadoop and can be

Data node is taking time to start.. Error register getProtocolVersion in namenode..!!

2011-06-01 Thread praveenesh kumar
Hello Hadoop users.!!! Well.. I am doing simple hadoop single node installation.. but my datanode is taking some time to run.. If I go through the namenode logs.. I am getting some strange exception. 2011-06-02 03:59:59,959 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:

Heap Size question.

2011-06-01 Thread Ken Williams
Hi All, I'm a bit confused about the values displayed on the 'jobtracker.jsp' page. In particular, there's a section called 'Cluster Summary'. I'm running a small 4-machine Hadoop cluster, and when I point a web-browser at my master machine (http://master:50030/jobtracker.jsp) it

Re: Heap Size question.

2011-06-01 Thread Joey Echeverria
The values show you the maximum heap size and currently used heap of the job tracker, not running jobs. Furthermore, the HADOOP_HEAPSIZE setting only sets the maximum heap for the daemons, not the tasks in your job. If you're getting OOMEs, you should add a setting to your mapred-site.xml file

Re: How to debug why I don't get hadoop logs?

2011-06-01 Thread Gabriele Kahlout
Apparently the issue is more complicated that I first thought, and it is not a Nutch issue. Submitting a MapReduce Job to the JobTracker (through JobClient interface) the task is executed on another node, with different stdout and stderr from the ones the job is submitted. Hence, no matter what

Re: Starting JobTracker Locally but binding to remote Address

2011-06-01 Thread Juan P.
Joey, I just tried it and it worked great. I configured the entire cluster (added a couple more DataNodes) and I was able to run a simple map/reduce job. Thanks for your help! Pony On Tue, May 31, 2011 at 6:26 PM, gordoslocos gordoslo...@gmail.com wrote: :D i'll give that a try 1st thing in

Re: DistributedCache - getLocalCacheFiles method returns null

2011-06-01 Thread neeral beladia
btw, just to let you know that I am running my job in a pseudo-distributed mode. Thanks, Neeral From: neeral beladia neeral_bela...@yahoo.com To: common-user@hadoop.apache.org Sent: Tue, May 31, 2011 10:00:00 PM Subject: DistributedCache - getLocalCacheFiles

Adding first datanode isn't working

2011-06-01 Thread MilleBii
Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ The data node has datanode tasktracker running (jps command shows them), which means start-dfs.sh and start-mapred.sh

Hadoop Filecrusher! V2 Released!

2011-06-01 Thread Edward Capriolo
All, You know the story: You have data files that are created every 5 minutes. You have hundreds of servers. You want to put those files in hadoop. Eventually: You get lots of files and blocks. Your namenode and secondary name node need more memory (BTW JVM's have issues at large Xmx values).

speculative execution

2011-06-01 Thread Shrinivas Joshi
To find out whether it had any positive performance impact, I am trying with turning OFF speculative execution. Surprisingly, the job starts to fail in reduce phase with OOM errors when I disable speculative execution for both map and reduce tasks. Has anybody noticed similar behavior? Is there a

Re: Adding first datanode isn't working

2011-06-01 Thread jagaran das
Check the password less SSH is working or not Regards, Jagaran From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 12:28:54 PM Subject: Adding first datanode isn't working Newbie on hadoop clusters. I have setup my two

Re: Adding first datanode isn't working

2011-06-01 Thread MilleBii
Thx, already did that so I can ssh phraseless master to master and master to slave1. Same as before datanode tasktracker are starting up/shuting down well on slave1 2011/6/1 jagaran das jagaran_...@yahoo.co.in Check the password less SSH is working or not Regards, Jagaran

Re: Adding first datanode isn't working

2011-06-01 Thread MilleBii
OK found my issue. Turned off ufw and it sees the datanode. So I need to fix my ufw setup. 2011/6/1 MilleBii mille...@gmail.com Thx, already did that so I can ssh phraseless master to master and master to slave1. Same as before datanode tasktracker are starting up/shuting down well on

Re: Adding first datanode isn't working

2011-06-01 Thread jagaran das
ufw From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 3:37:23 PM Subject: Re: Adding first datanode isn't working OK found my issue. Turned off ufw and it sees the datanode. So I need to fix my ufw setup.

Re: Poor IO performance on a 10 node cluster.

2011-06-01 Thread hadoopman
Some things which helped us include setting your vm.swappiness to 0 and mounting your disks with noatime,nodiratime options. Also make sure your disks aren't setup with RAID (JBOD is recommended) You might want to run terasort as you tweak your environment. It's very helpful when checking if

Re: Poor IO performance on a 10 node cluster.

2011-06-01 Thread Ted Dunning
It is also worth using dd to verify your raw disk speeds. Also, expressing disk transfer rates in bytes per second makes it a bit easier for most of the disk people I know to figure out what is large or small. Each of these disks disk should do about 100MB/s when driven well. Hadoop does OK,

IO benchmark ingesting data into HDFS

2011-06-01 Thread Matthew John
Hi all, I wanted to use an IO benchmark that reads/writes Data from/into the HDFS using MapReduce. TestDFSIO. I thought, does this. But what I understand is that TestDFSIO merely creates the files in a temp folder in the local filesystem of the TaskTracker nodes. Is this correct? How can such an

Re: speculative execution

2011-06-01 Thread Matei Zaharia
Usually the number of speculatively executed tasks is equal to the number of killed tasks in the UI (as opposed to failed). When Hadoop runs a speculative task, it ends up killing either the original or the speculative task, depending on which one finishes first. I don't think OOM errors would