Custom error message and action for authentication failure

2013-09-05 Thread Benoy Antony
Hello All, We have a requirement to display a custom error message and do some bookkeeping if a user faces an authentication error when making a hadoop call. We use Hadoop 1. How do we go about accomplishing this ? benoy

Re: How to update the timestamp of a file in HDFS

2013-09-05 Thread murali adireddy
Hi , Try this touchz hadoop command. hadoop -touchz filename Thanks and Regards, Adi Reddy Murali On Thu, Sep 5, 2013 at 11:06 AM, Harsh J ha...@cloudera.com wrote: There's no shell command (equivalent to Linux's touch) but you can use the Java API:

Re: How to update the timestamp of a file in HDFS

2013-09-05 Thread murali adireddy
right usage of command is: hadoop fs - touchz filename On Thu, Sep 5, 2013 at 12:14 PM, murali adireddy murali.adire...@gmail.comwrote: Hi , Try this touchz hadoop command. hadoop -touchz filename Thanks and Regards, Adi Reddy Murali On Thu, Sep 5, 2013 at 11:06 AM, Harsh J

how to observe spill operation during shuffle in mapreduce?

2013-09-05 Thread ch huang
hi,all: is there any MR spill count metric?

Re: how to observe spill operation during shuffle in mapreduce?

2013-09-05 Thread Ravi Kiran
Hi , You can look at the job metrics from your jobtracker Web UI . The Spilled Record Counter under the group Map Reduce Framework displays the number of records spilled in both map and reduce tasks. Regards Ravi Magham. On Thu, Sep 5, 2013 at 12:23 PM, ch huang justlo...@gmail.com wrote:

Re: Multidata center support

2013-09-05 Thread Visioner Sadak
Hi friends hello baskar i think rack awareness and data center awareness are different and similarly nodes and data centers are different things from hadoops perspective but ideally it shud be same i mean nodes can be in different data centers right but i think hadoop doesnt not replicate data

Re: M/R API and Writable semantics in reducer

2013-09-05 Thread Jan Lukavský
Hi, is there anyone interested in this topic? Basically, what I'm trying to find out is, whether it is 'safe' to rely on the side-effect of updating key during iterating values. I believe that there must be someone who is also interested in this, the secondary sort pattern is very common (at

Re: How to update the timestamp of a file in HDFS

2013-09-05 Thread Harsh J
Murali, The touchz creates a zero sized file. It does not allow modifying a timestamp like Linux's touch command does, which is what the OP seems to be asking about. On Thu, Sep 5, 2013 at 12:14 PM, murali adireddy murali.adire...@gmail.com wrote: right usage of command is: hadoop fs - touchz

Re: what is the difference between mapper and identity mapper, reducer and identity reducer?

2013-09-05 Thread Shahab Yunus
Identity Mapper and Reducer just like the concept of Identity function in mathematics i.e. do not transform the input and return it as it is in output form. Identity Mapper takes the input key/value pair and spits it out without any processing. The case of identity reducer is a bit different. It

Re: Symbolic Link in Hadoop 1.0.4

2013-09-05 Thread Suresh Srinivas
FileContext APIs and symlink functionality is not available in 1.0. It is only available in 0.23 and 2.x release. On Thu, Sep 5, 2013 at 8:06 AM, Gobilliard, Olivier olivier.gobilli...@cartesian.com wrote: Hi, I am using Hadoop 1.0.4 and need to create a symbolic link in HDSF. This

Question related to resource allocation in Yarn!

2013-09-05 Thread Rahul Bhattacharjee
Hi, I am trying to make a small poc on top of yarn. Within the launched application master , I am trying to request for 50 containers and launch a same task on those allocated containers. My config : AM registration response minimumCapability {, memory: 1024, virtual_cores: 1, },

Symbolic Link in Hadoop 1.0.4

2013-09-05 Thread Gobilliard, Olivier
Hi, I am using Hadoop 1.0.4 and need to create a symbolic link in HDSF. This feature has been added in Hadoop 0.21.0 (https://issues.apache.org/jira/browse/HDFS-245) in the new FileContext API (http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html). However, I cannot

RE: Multidata center support

2013-09-05 Thread Baskar Duraikannu
Thanks Mike. I am assuming that it is a poor idea due to network bandwidth constraints across data center (backplane speed of TOR is typically greater than data center connectivity). From: michael_se...@hotmail.com Subject: Re: Multidata center support Date: Wed, 4 Sep 2013 20:15:08 -0500 To:

Re: Disc not equally utilized in hdfs data nodes

2013-09-05 Thread Harsh J
Please share your hdfs-site.xml. HDFS needs to be configured to use all 4 disk mounts - it does not auto-discover and use all drives today. On Thu, Sep 5, 2013 at 10:48 PM, Viswanathan J jayamviswanat...@gmail.com wrote: Hi, The data which are storing in data nodes are not equally utilized in

Disc not equally utilized in hdfs data nodes

2013-09-05 Thread Viswanathan J
Hi, The data which are storing in data nodes are not equally utilized in all the data directories. We having 4x1 TB drives, but huge data storing in single disc only at all the nodes. How to balance for utilize all the drives. This causes the hdfs storage size becomes high very soon even though

RE: yarn-site.xml and aux-services

2013-09-05 Thread John Lilley
Harsh, Thanks as usual for your sage advice. I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble. FWIW, I would vote to be able to load YARN services

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Jitendra Yadav
Please share your Hadoop version and hdfs-site.xml conf also I'm assuming that you already restarted your cluster after changing fs.checkpoint.dir. Thanks On 9/5/13, Munna munnava...@gmail.com wrote: Hi, I have configured fs.checkpoint.dir in hdfs-site.xml, but still it was writing in /tmp

RE: Multidata center support

2013-09-05 Thread Baskar Duraikannu
Currently there is no relation betweeen weak consistency and hadoop. I just spent more time thinking about the requirement (as outlined below) a) Maintain total of 3 data centers b) Maintain 1 copy per data center c) If any data center goes down, dont create additional copies.

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Harsh J
These configs need to be present at SNN, not at just the NN. On Thu, Sep 5, 2013 at 11:54 PM, Munna munnava...@gmail.com wrote: Hi Yadav, We are using CDH3 and I restarted after changing configuration. property namefs.checkpoint.dir/name

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Jitendra Yadav
Hi, Well I think you should only restart your SNN after the change. Also refer the checkpoint directory for any 'in_use.lock' file. Thanks Jitendra On 9/6/13, Munna munnava...@gmail.com wrote: Thank you Jitendar. After chage these perameter on SNN, is it require to restart NN also? please

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Munna
in_use.lock ? On Fri, Sep 6, 2013 at 12:26 AM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, Well I think you should only restart your SNN after the change. Also refer the checkpoint directory for any 'in_use.lock' file. Thanks Jitendra On 9/6/13, Munna munnava...@gmail.com wrote:

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Jitendra Yadav
Hi, This means that your specified checkpoint directory has been locked by SNN for use. Thanks Jitendra On 9/6/13, Munna munnava...@gmail.com wrote: in_use.lock ? On Fri, Sep 6, 2013 at 12:26 AM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, Well I think you should only restart

Re: ContainerLaunchContext in 2.1.x

2013-09-05 Thread Omkar Joshi
Good question... There was a security problem earlier and to address that we removed it from ContainerLaunchContext. Today if you check the payload we are sending Container which contains ContainerToken. ContainerToken is the secured channel for RM to tell NM about 1) ContainerId 2) Resource 3)

Re: ContainerLaunchContext in 2.1.x

2013-09-05 Thread Jian He
Other than that, you can find all API incompatible changes from 2.0.x to 2.1.x in this link: http://hortonworks.com/blog/stabilizing-yarn-apis-for-apache-hadoop-2-beta-and-beyond/ Jian On Thu, Sep 5, 2013 at 10:44 AM, Omkar Joshi ojo...@hortonworks.com wrote: Good question... There was a

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Jitendra Yadav
Hi, If you are running SNN on same node as NN then it's ok otherwise you should add these properties at SNN side too. Thanks Jitendra On 9/6/13, Munna munnava...@gmail.com wrote: you mean that same configurations are required as NN in SNN On Thu, Sep 5, 2013 at 11:58 PM, Harsh J

Re: SNN not writing data fs.checkpoint.dir location

2013-09-05 Thread Munna
Thank you Jitendar. After chage these perameter on SNN, is it require to restart NN also? please confirm... On Fri, Sep 6, 2013 at 12:10 AM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, If you are running SNN on same node as NN then it's ok otherwise you should add these properties

RE: yarn-site.xml and aux-services

2013-09-05 Thread John Lilley
https://issues.apache.org/jira/browse/YARN-1151 --john -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, September 05, 2013 12:14 PM To: user@hadoop.apache.org Subject: Re: yarn-site.xml and aux-services Please log a JIRA on

Re: Disc not equally utilized in hdfs data nodes

2013-09-05 Thread Harsh J
The spaces may be a problem if you are using the older 1.x releases. Please try to specify the list without spaces, and also check if all of these paths exist and have some DN owned directories under them. Please also keep the lists in CC/TO when replying. Clicking Reply-to-all usually helps do

How to support the (HDFS) FileSystem API of various Hadoop Distributions?

2013-09-05 Thread Christian Schneider
Hi, I start to write a small ncdu clone to browse HDFS on the CLI ( http://nchadoop.org/). Currently i'm testing it against CDH4, - but I like to make it available for a wider group of users (Hortonworks, ..). Is it enough to pick different vanilla Versions (for IPC 5, 7)? Best Regards,

How to speed up Hadoop?

2013-09-05 Thread Sundeep Kambhampati
Hi all, I am looking for ways to configure Hadoop inorder to speed up data processing. Assuming all my nodes are highly fault tolerant, will making data replication factor 1 speed up the processing? Are there some way to disable failure monitoring done by Hadoop? Thank you for your

Re: How to speed up Hadoop?

2013-09-05 Thread Chris Embree
I think you just went backwards. more replicas (generally speaking) are better. I'd take 60 cheap, 1 U servers over 20 highly fault tolerant ones for almost every problem. I'd get them for the same or less $ too. On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati

Re: How to speed up Hadoop?

2013-09-05 Thread Preethi Vinayak Ponangi
Solution 1: Throw more hardware at the cluster. That's the whole point of hadoop. Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of jobs you are running. I wouldn't suggest decreasing the number of replications as it kind of defeats the purpose of using Hadoop. You could

Re: Disc not equally utilized in hdfs data nodes

2013-09-05 Thread Viswanathan J
Thanks Harsh. Hope I don't have space in my list which I specified in the last mail. Thanks, V On Sep 5, 2013 11:20 PM, Harsh J ha...@cloudera.com wrote: The spaces may be a problem if you are using the older 1.x releases. Please try to specify the list without spaces, and also check if all

Re: How to speed up Hadoop?

2013-09-05 Thread Peyman Mohajerian
How about this: http://hadoop.apache.org/docs/stable/vaidya.html I've never tried it myself, i was just reading about it today. On Thu, Sep 5, 2013 at 5:57 PM, Preethi Vinayak Ponangi vinayakpona...@gmail.com wrote: Solution 1: Throw more hardware at the cluster. That's the whole point of

Re: How to speed up Hadoop?

2013-09-05 Thread Sundeep Kambhampati
On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote: Solution 1: Throw more hardware at the cluster. That's the whole point of hadoop. Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of jobs you are running. I wouldn't suggest decreasing the number of replications as it

Re: How to speed up Hadoop?

2013-09-05 Thread Sundeep Kambhampati
On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote: Solution 1: Throw more hardware at the cluster. That's the whole point of hadoop. Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of jobs you are running. I wouldn't suggest decreasing the number of replications as it

Re: How to support the (HDFS) FileSystem API of various Hadoop Distributions?

2013-09-05 Thread Harsh J
Hello, There are a few additions to the FileSystem that may bite you across versions, but if you pick an old stable version such as Apache Hadoop 0.20.2, and stick to only its offered APIs, it would work better across different version dependencies as we try to maintain FileSystem as a stable

Re: How to support the (HDFS) FileSystem API of various Hadoop Distributions?

2013-09-05 Thread Harsh J
Oh and btw, nice utility! :) On Fri, Sep 6, 2013 at 7:50 AM, Harsh J ha...@cloudera.com wrote: Hello, There are a few additions to the FileSystem that may bite you across versions, but if you pick an old stable version such as Apache Hadoop 0.20.2, and stick to only its offered APIs, it

Re: How to speed up Hadoop?

2013-09-05 Thread Harsh J
I'd recommend reading Eric Sammer's Hadoop Operations (O'Reilly) book. It goes over a lot of this stuff - building, monitoring, tuning, optimizing, etc.. If your goal is just speed and quicker results, and not retention or safety, by all means use replication factor as 1. Note that its difficult

RE: Question related to resource allocation in Yarn!

2013-09-05 Thread Devaraj k
Hi Rahul, Could you tell me, what is the version you are using? · If you want a container, you need to issue 3 resource requests (1-node local, 1-rack local and 1-Any(*) ). If you are using

Re: Question related to resource allocation in Yarn!

2013-09-05 Thread Rahul Bhattacharjee
Hi Devaraj, I am on Hadoop 2.0.4 . I am able to get containers now and my yarn app runs properly. I am setting hostname as * , while requesting containers. There is no problem as of now , only thing is I am allocated only 2 containers at one time , however I believe that the node manager can run