Are you sure? AFAIK all mapred.xxx properties can be set via job config. I also
read on yahoo tutorial that this property can be either set in hadoop-site.XML
or job config. May be someone can confirm this who have really used this
property.
Praveen
On Jul 1, 2011, at 4:46 PM, "ext Anthony Urs
Hi all,
I am using hadoop 0.20.2. I am setting the property
mapred.tasktracker.map.tasks.maximum = 4 (same for reduce also) on my job conf
but I am still seeing max of only 2 map and reduce tasks on each node. I know
my machine can run 4 maps and 4 reduce tasks in parallel. Is this a bug in
0.2
Hi David,
Thanks for the response. I didn't specify anything for no. of concurrent
mappers but I do see that it shows as 10 on 50030 (for 5 node cluster). So I
believe hadoop is defaulting to no. of cores in the cluster which is 10. That
is why I want to choose the map tasks also same as no. of
Hi David,
I think Hadoop is looking at the data size, not the no. of input files. If I
pass in .gz files, then yes hadoop is choosing 1 map task per file but if I
pass in HUGE text file or same file split into 10 files, its choosing same no.
of maps tasks (191 in my case).
Thanks
Praveen
-
Hi there,
I know client can send "mapred.reduce.tasks" to specify no. of reduce tasks and
hadoop honours it but "mapred.map.tasks" is not honoured by Hadoop. Is there
any way to control number of map tasks? What I noticed is that Hadoop is
choosing too many mappers and there is an extra overhead
Hello all,
How do I print the job status of each job on the client with the % complete. I
am invoking the hadoop jobs using the java client (not hadoop cli) and I am not
seeinf the map and reduce job status on the command line. Is there a property
that I can set in the Configuration?
Praveen
Hi there,
I think you got the run(String[] args) method right but in the main method you
are not calling your run method but ToolRunner.run. You need to invoke your
method in order to point to localhost:54310 otherwise it will read those
properties from the default hadoop conf.
Praveen
Hello all,
I installed hadoop0.20.2 on physical machines and everything works like a
charm. Now I installed hadoop using the same hadoop-install gz file on the
cloud. Installation seems fine. I can even copy files to hdfs from master
machine. But when I try to do it from another "non hadoop" mac
Hello all,
I have few mapreduce jobs that I am calling from a java driver. The problem I
am facing is that when there is an exception in mapred job, the exception is
not propogated to the client so even if first job failed, its going to second
job and so on. Is there an another way of catching e
I got this working when I bumped up the memory on the cloud to 8GB instead of
4GB. I guess with 4GB it was running out of resources.
Praveen
From: ext praveen.pe...@nokia.com [praveen.pe...@nokia.com]
Sent: Thursday, February 10, 2011 4:40 PM
To: common-u...@hadoo
Hello all,
I have been using Hadoop on physical machine for sometime now. But recently I
tried to run the same hadoop jobs on the Raskspace cloud and I am not yet
successful.
My input file has 150M transactions and all hadoop jobs finish in less than 90
minutes on a 4 node 4GB hadoop cluster on
Hello all,
I am having issues with accessing hdfs and I figured its due to version
mismatch. I know my jar files have multiple copies of hadoop (pig has its own,
I have hadoop 0.20.2 and Whirr had its own hadoop copy). My question how to
find the right version of hadoop that matches with the one
Hello all,
I have set the Hadoop environment variable HADOOP_CONF_DIR and trying to run a
Hadoop job from a java application but the job is not looking the hadoop config
in this HADOOP_CONF_DIR folder. If I copy the xml files from this folder on to
java application classpath, it works fine. Sinc
I found the solution. The problems was with ports. Looks like cloud doesn't
have these ports open. For now I shutdown iptables on all hadoop machines and
things worked magically.
Thanks
Praveen
From: ext praveen.pe...@nokia.com [mailto:praveen.pe...@nokia.com]
S
Hello all,
I am trying to run Hadoop on Rackspace and I am having issues with starting up
servers. I have configrued everything on cloud exactly same as my local hadoop
(which is working) but I can't start servers. HDFS fails to start. Does anyone
had any luck on installing and starging Hadoop o
Hello all,
I have few map reduce jobs that I am invoking from a glasfish container. I am
not using "hadoop" command line tool but calling the jobs directly from
glassfish programatically. I have a driver that runs on glassfish and calls
these jobs sequentially. I was able to run the jobs as long
Hi Henning,
Thanks again.
Let me explain my scenario first so you make a better sense out of my question
. I have a web application running on glassfish server. Every 24 hours Quartz
job runs on the server and I need to call set of Hadoop jobs one after the
other, read the final output and stor
Hi Henning,
Putting core-site.xml in classpath worked. Thanks for the help. I need to
figure how to submit a job as a different user than the user hadoop is
configured for.
I have one more related to job submission. Did anyone face problem with running
job that involves multiple jar files. I am
Hi Henning,
adding hadoop's conf folder didn't help fixing the issue but when I added the
two below properties, I was able to access file system but cannot write
anything due to different user. I have following questions based on experiments.
1. How can I access HDFS or submit jobs as different
Hi Thanks for your reply. In my case I have a Driver that calls multiple jobs
one after the other. I am using the following code to submit each job but it
uses local hadoop jar files that is in the classpath. Its not submitting the
job to Hadoop cluster. I thought I would need to specify where t
Hi all,
I am trying to figure how I can start a hadoop job porgramatically from my Java
application running in an app server. I was able to run my map reduce job using
hadoop command from hadoop master machine but my goal is to run the same job
from my java program (running on a different machin
Thats a good point. I was indeed using gzip file that has a csv file in it. I
uncompressed it and used csv file and now I can see many mappers running
concurrently.
Thanks for the suggestion. This is an important piece of information many
people will miss since compressed format is a more logic
Hi all,
I have been trying to figure out why all mappers run only on one machine when I
have 4 node cluster. Ruduce part is running fine on all 4 nodes correctly. I am
using 0.20.2. My input file is a large single file (10GB)
Here is my config in mapred-site.xml. I specified map.tasks as 30 but
23 matches
Mail list logo