Measuring Shuffle time for MR job

2012-08-27 Thread praveenesh kumar
Is there a way to know the total shuffle time of a map-reduce job - I mean some command or output that can tell that ? I want to measure total map, total shuffle and total reduce time for my MR job -- how can I achieve it ? I am using hadoop 0.20.205 Regards, Praveenesh

Re: Measuring Shuffle time for MR job

2012-08-27 Thread Bertrand Dechoux
Shuffle time is considered as part of the reduce step. Without reduce, there is no need for shuffling. One way to measure it would be using the full reduce time with a '/dev/null' reducer. I am not aware of any way to measure it. Regards Bertrand On Mon, Aug 27, 2012 at 8:18 AM, praveenesh

Re: Measuring Shuffle time for MR job

2012-08-27 Thread Raj Vishwanathan
You can extract the shuffle time from the job log. Take a look at  https://github.com/rajvish/hadoop-summary  Raj From: Bertrand Dechoux decho...@gmail.com To: common-user@hadoop.apache.org Sent: Monday, August 27, 2012 12:57 AM Subject: Re: Measuring

Number of reducers

2012-08-27 Thread Abhishek
Hi all, I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class. If i

Problem with fsck when security is enabled

2012-08-27 Thread Amith D K
Hi When I try to access the fsck report from the web browser directly like http://NNIP:HTTP_PORT/fsck I am getting the following exception 2012-08-27 14:21:57,591 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException:

Why cannot I start namenode or localhost:50070 ?

2012-08-27 Thread Charles AI
Hi All, I was running a cluster of one master and 4 slaves. I copied the hadoop_install folder from the master to all 4 slaves, and configured them well. How ever when i sh start-all.sh from the master machine. It shows below: starting namenode, logging to

Re: Why cannot I start namenode or localhost:50070 ?

2012-08-27 Thread TianYi Zhu
Hi Charles, map/reduce(jobtracker/tasktrackers, localhost:50030) is based on hdfs(namenode/datanodes, localhost:50070) or local file system. It seems there is something wrong with the hdfs, so the map/reduce is blocked and shows INITIALIZING, please check the log of namenode(

Re: Why cannot I start namenode or localhost:50070 ?

2012-08-27 Thread Harsh J
Charles, Can you check your NN logs to see if it is properly up? On Mon, Aug 27, 2012 at 12:33 PM, Charles AI hadoo...@gmail.com wrote: Hi All, I was running a cluster of one master and 4 slaves. I copied the hadoop_install folder from the master to all 4 slaves, and configured them well.

Re: Why cannot I start namenode or localhost:50070 ?

2012-08-27 Thread Charles AI
Yeah, thank you. Both NN log and DN log on the master machine are empty files, having a size of 0. On Mon, Aug 27, 2012 at 3:16 PM, Harsh J ha...@cloudera.com wrote: Charles, Can you check your NN logs to see if it is properly up? On Mon, Aug 27, 2012 at 12:33 PM, Charles AI

Re: Error while executing HAR job

2012-08-27 Thread Visioner Sadak
On Mon, Aug 27, 2012 at 1:35 PM, Visioner Sadak visioner.sa...@gmail.comwrote: Hello experts, While creating a HAR file sometimes the job executes successfully and sometimes it throws an error any idea why this is happening really a weird error i am running hadoop on

Re: Why cannot I start namenode or localhost:50070 ?

2012-08-27 Thread Mohammad Tariq
Hello Charles, Have you added dfs.name. dir and dfs.data. dir props in your hdfs-site.xml file??Values of these props default to the /tmp dir, so at each restart both data and meta info is lost. On Monday, August 27, 2012, Charles AI hadoo...@gmail.com wrote: thank you guys. the logs say

Re: hadoop security

2012-08-27 Thread Ivan Frain
In addition to what mentioned Atul, I recommend the pdf securitydesign.pdf in attachment of this hadoop jira issue: https://issues.apache.org/jira/browse/HADOOP-4487 It explain deeply what is implemented in Hadoop common security (used by hdfs) BR, Ivan 2012/8/25 Atul Thapliyal

Re: hadoop security

2012-08-27 Thread Sachin Aggarwal
thanks a lot guys On Mon, Aug 27, 2012 at 4:31 PM, Ivan Frain ivan.fr...@gmail.com wrote: In addition to what mentioned Atul, I recommend the pdf securitydesign.pdf in attachment of this hadoop jira issue: https://issues.apache.org/jira/browse/HADOOP-4487 It explain deeply what is

Locked Job

2012-08-27 Thread Juan P.
Hi guys! I need some clarification on the expected behavior for a hadoop MapReduce job. Say I was to create a Mapper task which never ends. It reads the first line of input and then reads data from an external service eternally. If the service is empty it will lock until data is available.

Re: Locked Job

2012-08-27 Thread Michael Segel
It depends... If you are in Mapper.map() method and you 'lock', then you will most certainly time out after 10min by default and the task dies. Enough tasks die, then your job dies. If in Mapper.setup() method you create a heartbeat thread where every minute you wake up and update the

Re: Locked Job

2012-08-27 Thread Harsh J
Hi, On Mon, Aug 27, 2012 at 6:43 PM, Juan P. gordoslo...@gmail.com wrote: Hi guys! I need some clarification on the expected behavior for a hadoop MapReduce job. Say I was to create a Mapper task which never ends. It reads the first line of input and then reads data from an external

hadoop native libs 32 and 64 bit

2012-08-27 Thread Steven Willis
Hi, I've been looking for both the 32 and 64 bit hadoop native libraries and it looks like the existence and location of these libraries keeps changing between releases. I downloaded the following releases: hadoop-0.22.0 hadoop-0.23.0 hadoop-0.23.1 hadoop-1.0.1 hadoop-1.0.2 hadoop-1.0.3

Controlling on which node a reducer will be executed

2012-08-27 Thread Eduard Skaley
Hi, i have a question concerning the execution of reducers. To use effectively the data locality of blocks in my use case i want to control on which node a reducer will be executed. In my scenario i have a chain of map-reduce jobs where each job will be executed by exactly N reducers. I want to

Re: hadoop native libs 32 and 64 bit

2012-08-27 Thread Harsh J
Hi Steven, You may also use the common-dev@ lists for development discussions/issues around common elements :) Just for some context, this was changed in 2.x by us via: https://issues.apache.org/jira/browse/HADOOP-7874 On Mon, Aug 27, 2012 at 10:39 PM, Steven Willis swil...@compete.com wrote:

RE: hadoop native libs 32 and 64 bit

2012-08-27 Thread Steven Willis
(moving to common-dev) Thanks Harsh, So what's the final outcome of these changes? Do we get both 32 and 64 bit libraries in the release tarball? Will they be underneath an arch dir, or directly under lib/native? I'm just a bit confused because the issue you reference is: native libs should

Re: Adding additional storage

2012-08-27 Thread Harsh J
Hey Keith, Pseudo-distributed isn't any different from fully-distributed, operationally, except for nodes = 1 - so don't let it limit your thoughts :) Stop the HDFS cluster, mv your existing dfs.name.dir and dfs.data.dir dir contents onto the new storage mount. Reconfigure dfs.data.dir and

Re: best way to join?

2012-08-27 Thread Ted Dunning
Mahout is getting some very fast knn code in version 0.8. The basic work flow is that you would first do a large-scale clustering of the data. Then you would make a second pass using the clustering to facilitate fast search for nearby points. The clustering will require two map-reduce jobs, one