Unable to run Gridmix_2 benchmark

2012-02-11 Thread ArunKumar
Hi guys ! I have a single node set up as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ I was trying to run Gridmix2 benchmark using real data as per its

Mumak with Capacity Scheduler : Submitting jobs to a particular queue

2012-01-29 Thread ArunKumar
Hi guys ! I have run mumak with FIFO. It works fine. I am trying to run jobtrace in test/data with capacity scheduler. I have done : 1 Build contrib/capacity-scheduler 2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/ 3set mapred.jobtracker.taskScheduler to

Mumak with Capacity Scheduler : Submitting jobs to a particular queue

2012-01-29 Thread ArunKumar
Hi guys ! I have run mumak with FIFO. It works fine. I am trying to run jobtrace in test/data with capacity scheduler. I have done : 1 Build contrib/capacity-scheduler 2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/ 3set mapred.jobtracker.taskScheduler to

RE: How to find out whether a node is Overloaded from Cpu utilization ?

2012-01-17 Thread ArunKumar
Guys ! So can i say that if memory usage is more than say 90 % the node is overloaded. If so, what can be that threshold percent value or how can we find it ? Arun -- View this message in context:

Re: How do i customize the name of the job during submission?

2012-01-05 Thread ArunKumar
' via the regular JobConf/Job APIs. On 04-Jan-2012, at 9:31 PM, ArunKumar wrote: Hi guys ! When a Job is submitted it is given an ID say job_200904211745_0018 in Hadoop. But for some reason i want to submit it with ID say job1. How can i do that ? Arun -- View this message

Re: network configuration (etc/hosts) ?

2011-12-21 Thread ArunKumar
MirrorX, Try out adding hostname of your master and slave system also to /etc/hosts/ That fixed same error for me. master- 127.0.0.1 localhost6.localdomain6 localhost 127.0.1.1 localhost4.localdomain4 master-pc 192.168.7.110 master master-pc 192.168.7.157 slave lab-pc slave- 127.0.1.1

Can reduce phase be CPU/IO bound ?

2011-12-21 Thread ArunKumar
Hi guys ! If we neglect the shuffle part, can reduce phase be CPU/IO bound ? Can anyone suggest some benchmark or example where we can see this ? Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Can-reduce-phase-be-CPU-IO-bound-tp3605913p3605913.html Sent from the

Generating job and topology traces from history folder of multinode cluster using Rumen

2011-12-15 Thread ArunKumar
Hai guys ! I have set up 5 node cluster with each of them in different racks. I have hadoop-0.20.2 set up on my Eclipse Helios. So, i ran Tracebuilder using Main Class: org.apache.hadoop.tools.rumen.TraceBuilder I ran some jobs on cluster and used copy of /usr/local/hadoop/logs/history folder

Re: Analysing Completed Job info programmatically apart from Jobtracker GUI

2011-12-14 Thread ArunKumar
/hadoop_cluster_profiler Worth checking out as discovering how to connect and mine information from the JobTracker was quite fun. Edward On Wed, Dec 14, 2011 at 9:40 AM, ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3586151i=0 wrote: Hi Guys ! I want to analyse

Re: How do i set number of tasktrackers per node ?

2011-12-14 Thread ArunKumar
Harsh, I want to generate huge log files with my existing 5 node cluster for mumak so that i can group some nodes under racks and run more jobs on a single node. For that i want to increase number of tasktrackers. I didn't get just ensure that the TT's http port is different for every instance

Re: How do i set number of tasktrackers per node ?

2011-12-14 Thread ArunKumar
Harsh, I am having a multinode cluster and i set mapred.task.tracker.http.address as 0.0.0.0:0 in mapred-site.xml of master node. I did: bin/start-all.sh bin/hadoop-daemon.sh start tasktracker but it gives tasktracker running as process 5945. Stop it first. Ii have set dfs.permissions to false

Where do i see Sysout statements after building example ?

2011-12-13 Thread ArunKumar
HI guys ! I have a single node set up as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ 1I have put some sysout statements in Jobtracker and wordcount (src/examples/org/..) code 2ant build 3Ran example jar with wordcount Where do i find the sysout

Re: Accessing Job counters displayed in WEB GUI in Hadoop Code

2011-12-12 Thread ArunKumar
Guys ! How can i access Average map/reduce task run time for a job in Job Client Code ? Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Accessing-Job-counters-displayed-in-WEB-GUI-in-Hadoop-Code-tp3576925p3578967.html Sent from the Hadoop lucene-users mailing list

RE: Grouping nodes into different racks in Hadoop Cluster

2011-12-12 Thread ArunKumar
Hi ! i have three node cluster set up according to http://ankitasblogger.blogspot.com/2011/01/hadoop-cluster-setup.html http://ankitasblogger.blogspot.com/2011/01/hadoop-cluster-setup.html I have written a topology script and it doesn't work. For testing purpose i have also put simple script:

Build failed when Wordcount example code has been changed

2011-12-12 Thread ArunKumar
Hi guys ! I have set up a single node cluster as per below link http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#run-the-mapreduce-job I have tried to run Wordcount example. $bin/hadoop jar hadoop-*--examples.jar wordcount dfsinput dfsoutput It works. I

Re: Choosing IO intensive and CPU intensive workloads

2011-12-09 Thread ArunKumar
Alex, Thanks for the link. I have boxes of say 30 - 50 of free space. Obviously i can't run Terasort . What reasonable input size do i need to take to see the behaviour when Terasort and TestDFSIO are run? Is there any benchmark for mixed workload ? Arun -- View this message in context:

Re: Choosing IO intensive and CPU intensive workloads

2011-12-09 Thread ArunKumar
Alex, To see the behavior of a single node under compute intensive benchmark which params other than finish time of the jobs are available or which can be considered ? Arun -- View this message in context:

Re: Choosing IO intensive and CPU intensive workloads

2011-12-09 Thread ArunKumar
of them match yours? # cd /usr/lib/hadoop-0.20/ hadoop jar hadoop-*test*.jar - Alex On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3573343i=2 wrote: Alex, To see the behavior of a single node under compute intensive benchmark which

Grouping nodes into different racks in Hadoop Cluster

2011-12-09 Thread ArunKumar
Hi guys ! I am able to set up Hadoop Multinode Clusters as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ . I have all my nodes in a LAN. How do i group them

Choosing IO intensive and CPU intensive workloads

2011-12-08 Thread ArunKumar
Hi guys ! I want to see the behavior of a single node of Hadoop cluster when IO intensive / CPU intensive workload and mix of both is submitted to the single node alone. These workloads must stress the nodes. I see that TestDFSIO benchmark is good for IO intensive workload. 1 Which benchmarks

Checksum error during trace generation using Rumen

2011-12-07 Thread ArunKumar
Hi guys ! I was trying to generate job trace and topology trace. I have hadoop set up for hduser at /usr/local/hadoop and ran wordcount program as hduser . I have mapreduce component set up in eclipse for user arun. I set for a configuration : Class: org.apache.hadoop.tools.rumen.TraceBuilder

Re: Availability of Job traces or logs

2011-12-04 Thread ArunKumar
in it. BTW, what is the new scheduler about? Regards, Praveen On Sun, Dec 4, 2011 at 10:19 AM, ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3558972i=0 wrote: Amar, I am attempting to write a new scheduler for Hadoop and test it using Mumak. 1 I want to test its

Re: Availability of Job traces or logs

2011-12-03 Thread ArunKumar
. Amar On 12/1/11 8:48 AM, ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3556710i=0 wrote: Hi guys ! Apart from generating the job traces from RUMEN , can i get logs or job traces of varied sizes from some organizations. How can i make sure that the rumen generates only say

Capturing Map and Reduce I/O time

2011-11-29 Thread ArunKumar
Hi guys ! I see that hadoop doesn't capture the Map task I/O time and Reduce task I/O time and captures only map runtime and reduce runtime. Am i right ? By I/O time for map task i meant time taken by the map task to read the input chunk allocated to it for processing and the time for it to

Assignment of input splits across nodes in Hadoop

2011-11-16 Thread ArunKumar
Hi guys ! Q I see that createCache() method of JobInProgress is involved in assignment of input splits across nodes in Hadoop. Which classes are involved in assignment of input splits of jobs to nodes ? I am interested in modifying this assignment policy. How can i do it ? Q How can i access

How is data of each job assigned to nodes in Mumak ?

2011-11-15 Thread ArunKumar
Hi guys ! Q How can i assign data of each job in mumak nodes and what else i need to do ? In genreral how can i use the pluggable block-placement for HDFS in Mumak ? Meaning in my context i am using 19-jobs-trace json file and modified topology json file consisting of say 4 nodes. Since the

How is data of each job assigned in Mumak ?

2011-11-14 Thread ArunKumar
Hi guys ! Q How can i assign data of each job in mumak nodes and what else i need to do ? In genreral how can i use the pluggable block-placement for HDFS in Mumak ? Meaning in my context i am using 19-jobs-trace json file and modified topology json file consisting of say 4 nodes. Since the

Re: How is task scheduling done in Mumak ?

2011-11-13 Thread ArunKumar
Hi guys ! DOUBT CLEARED :) Mumak just uses the Hadoop schedulers for assigning tasks and it doesn't have as such its own SimulatorTaskScheduler. The implementation of assigntasks() used may be from JobQueueTaskScheduler.java or CapacityScheduler.java ,etc and this will be based on Scheduler

Scheduling in Mumak

2011-11-10 Thread ArunKumar
Hi guys ! I have gone thru Mumak code. I ran mumak.sh with given Job and Topology trace files .In my understanding i see that when a job is fetched from JobStoryProducer an event is asscoiated with it and the listener / node where it is assigned is fixed when these events are created. I have not

Making Mumak work with capacity scheduler

2011-09-21 Thread ArunKumar
Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary. I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253

Re: Making Mumak work with capacity scheduler

2011-09-21 Thread ArunKumar
. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1

Re: Making Mumak work with capacity scheduler

2011-09-21 Thread ArunKumar
- Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-20 Thread ArunKumar
Hi ! I gave rwx permission recursively for everybody: drwxrwxrwx 3 root root 4096 2011-09-18 23:38 app Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Submitting-Jobs-from-different-user-to-a-queue-in-capacity-scheduler-tp3345752p3351331.html Sent from the Hadoop

Building modified capacity scheduler and seeing console ouputs

2011-09-20 Thread ArunKumar
Hi ! I have set up hadoop as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ I am able to run jobs with capacity scheduler. I am interested in extending the capacity scheduler. So first i wanted to see whether i can build it after making changes. i

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-19 Thread ArunKumar
Hi guys ! Common things done by me : $chmod -R 777 hadoop_extract $chmod -R 777 /app @Joey I have created dfs dir /user/arun and made arun owner and tried as below 1 arun@arun-Presario-C500-RU914PA-ACJ:/$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/hadoop-examples-0.20.203.0.jar

Submitting Jobs from different user to a queue in capacity scheduler

2011-09-18 Thread ArunKumar
Hi ! I have set up hadoop on my machine as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ I am able to run application with capacity scheduler by submit jobs to a paricular queue from owner of hadoop hduser. I tried this from other user : 1.

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-18 Thread ArunKumar
Hi ! I have given permissions in the beginning $ sudo chown -R hduser:hadoop hadoop . I gave $chmod -R 777 hadoop When i try arun$ /home/hduser/hadoop203/bin/hadoop jar /home/hduser/hadoop203/hadoop-examples*.jar pi 1 1 I get Number of Maps = 1 Samples per Map = 1

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-18 Thread ArunKumar
Hi Uma ! I have added in hdfs-site.xml the following property namedfs.permissions/name valuefalse/value /property and restarted the cluster. I tried : arun@arun-Presario-C500-RU914PA-ACJ:~$ /home/hduser/hadoop203/bin/hadoop jar /home/hduser/hadoop203/hadoop-examples*.jar pi 1 1 Number of

Hadoop on eclipse : Running jobs with capacity Scheduler

2011-09-18 Thread ArunKumar
Hi ! I have step up hadoop on eclipse as per http://www.mail-archive.com/common-dev@hadoop.apache.org/msg02531.html http://www.mail-archive.com/common-dev@hadoop.apache.org/msg02531.html I could run wordcount example. i have modified site xml as necessary for capacity scheduler. When i run i

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-18 Thread ArunKumar
Hi Uma ! I have deleted the data in /app/hadoop/tmp and formatted namenode and restarted cluster.. I tried arun$ /home/hduser/hadoop203/bin/hadoop jar /home/hduser/hadoop203/hadoop-examples*.jar pi 1 1 Number of Maps = 1 Samples per Map = 1 org.apache.hadoop.security.AccessControlException:

Re: Running example application with capacity scheduler ?

2011-09-16 Thread ArunKumar
Hi all ! Problem found ! I have set the queue properties in the mapred-site.xml instead of capacity-scheduler.xml. Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Running-example-application-with-capacity-scheduler-tp3335471p3341934.html Sent from the Hadoop