Is there a way to know the total shuffle time of a map-reduce job - I mean
some command or output that can tell that ?
I want to measure total map, total shuffle and total reduce time for my MR
job -- how can I achieve it ? I am using hadoop 0.20.205
Regards,
Praveenesh
Shuffle time is considered as part of the reduce step. Without reduce,
there is no need for shuffling.
One way to measure it would be using the full reduce time with a
'/dev/null' reducer.
I am not aware of any way to measure it.
Regards
Bertrand
On Mon, Aug 27, 2012 at 8:18 AM, praveenesh
You can extract the shuffle time from the job log.
Take a look at
https://github.com/rajvish/hadoop-summary
Raj
From: Bertrand Dechoux decho...@gmail.com
To: common-user@hadoop.apache.org
Sent: Monday, August 27, 2012 12:57 AM
Subject: Re: Measuring
Hi all,
I just want to know that, based on what factor map reduce framework decides
number of reducers to launch for a job
By default only one reducer will be launched for a given job is this right? If
we explicitly does not mention number to launch via command line or driver
class.
If i
Hi
When I try to access the fsck report from the web browser directly like
http://NNIP:HTTP_PORT/fsck
I am getting the following exception
2012-08-27 14:21:57,591 WARN
org.apache.hadoop.security.authentication.server.AuthenticationFilter:
Authentication exception: GSSException:
Hi All,
I was running a cluster of one master and 4 slaves. I copied the
hadoop_install folder from the master to all 4 slaves, and configured them
well.
How ever when i sh start-all.sh from the master machine. It shows below:
starting namenode, logging to
Hi Charles,
map/reduce(jobtracker/tasktrackers, localhost:50030) is based on
hdfs(namenode/datanodes, localhost:50070) or local file system.
It seems there is something wrong with the hdfs, so the map/reduce is
blocked and shows INITIALIZING, please check the log of namenode(
Charles,
Can you check your NN logs to see if it is properly up?
On Mon, Aug 27, 2012 at 12:33 PM, Charles AI hadoo...@gmail.com wrote:
Hi All,
I was running a cluster of one master and 4 slaves. I copied the
hadoop_install folder from the master to all 4 slaves, and configured them
well.
Yeah, thank you. Both NN log and DN log on the master machine are empty
files, having a size of 0.
On Mon, Aug 27, 2012 at 3:16 PM, Harsh J ha...@cloudera.com wrote:
Charles,
Can you check your NN logs to see if it is properly up?
On Mon, Aug 27, 2012 at 12:33 PM, Charles AI
On Mon, Aug 27, 2012 at 1:35 PM, Visioner Sadak visioner.sa...@gmail.comwrote:
Hello experts,
While creating a HAR file sometimes the job executes
successfully and sometimes it throws an error any idea why this is
happening really a weird error i am running hadoop on
Hello Charles,
Have you added dfs.name. dir and dfs.data. dir props in your
hdfs-site.xml file??Values of these props default to the /tmp dir, so at
each restart both data and meta info is lost.
On Monday, August 27, 2012, Charles AI hadoo...@gmail.com wrote:
thank you guys.
the logs say
In addition to what mentioned Atul, I recommend the pdf
securitydesign.pdf in attachment of this hadoop jira issue:
https://issues.apache.org/jira/browse/HADOOP-4487
It explain deeply what is implemented in Hadoop common security (used by
hdfs)
BR,
Ivan
2012/8/25 Atul Thapliyal
thanks a lot guys
On Mon, Aug 27, 2012 at 4:31 PM, Ivan Frain ivan.fr...@gmail.com wrote:
In addition to what mentioned Atul, I recommend the pdf
securitydesign.pdf in attachment of this hadoop jira issue:
https://issues.apache.org/jira/browse/HADOOP-4487
It explain deeply what is
Hi guys!
I need some clarification on the expected behavior for a hadoop MapReduce
job.
Say I was to create a Mapper task which never ends. It reads the first line
of input and then reads data from an external service eternally. If the
service is empty it will lock until data is available.
It depends...
If you are in Mapper.map() method and you 'lock', then you will most certainly
time out after 10min by default and the task dies. Enough tasks die, then your
job dies.
If in Mapper.setup() method you create a heartbeat thread where every minute
you wake up and update the
Hi,
On Mon, Aug 27, 2012 at 6:43 PM, Juan P. gordoslo...@gmail.com wrote:
Hi guys!
I need some clarification on the expected behavior for a hadoop MapReduce
job.
Say I was to create a Mapper task which never ends. It reads the first line
of input and then reads data from an external
Hi,
I've been looking for both the 32 and 64 bit hadoop native libraries and it
looks like the existence and location of these libraries keeps changing between
releases. I downloaded the following releases:
hadoop-0.22.0
hadoop-0.23.0
hadoop-0.23.1
hadoop-1.0.1
hadoop-1.0.2
hadoop-1.0.3
Hi,
i have a question concerning the execution of reducers.
To use effectively the data locality of blocks in my use case i want to
control on which node a reducer will be executed.
In my scenario i have a chain of map-reduce jobs where each job will be
executed by exactly N reducers.
I want to
Hi Steven,
You may also use the common-dev@ lists for development
discussions/issues around common elements :)
Just for some context, this was changed in 2.x by us via:
https://issues.apache.org/jira/browse/HADOOP-7874
On Mon, Aug 27, 2012 at 10:39 PM, Steven Willis swil...@compete.com wrote:
(moving to common-dev)
Thanks Harsh,
So what's the final outcome of these changes? Do we get both 32 and 64 bit
libraries in the release tarball? Will they be underneath an arch dir, or
directly under lib/native? I'm just a bit confused because the issue you
reference is:
native libs should
Hey Keith,
Pseudo-distributed isn't any different from fully-distributed,
operationally, except for nodes = 1 - so don't let it limit your
thoughts :)
Stop the HDFS cluster, mv your existing dfs.name.dir and dfs.data.dir
dir contents onto the new storage mount. Reconfigure dfs.data.dir and
Mahout is getting some very fast knn code in version 0.8.
The basic work flow is that you would first do a large-scale clustering of
the data. Then you would make a second pass using the clustering to
facilitate fast search for nearby points.
The clustering will require two map-reduce jobs, one
22 matches
Mail list logo