Re: Multiple outputs

2013-03-18 Thread Harsh J
MultipleOutputs is the way to go :) On Tue, Mar 12, 2013 at 12:48 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: Hi Everyone, I would like to have 2 different output (having different columns of a same input text file.) When I googled a bit, I got multipleoutputs classes, is this the common

Re: access hadoop cluster from ubuntu on laptop

2013-03-18 Thread Harsh J
What George's suggested is more of a hack. If you want to write proper impersonation code that works despite the security toggle, follow http://hadoop.apache.org/docs/stable/Secure_Impersonation.html. Or in your case, alternatively, create a local user hdfs and use that via sudo -u hdfs prefixes.

RE: Thanks!!

2013-03-18 Thread Brahma Reddy Battula
you need to disable following property.. property namedfs.permissions.enabled/name valuetrue/value description If true, enable permission checking in HDFS. If false, permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value

using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Agarwal, Nikhil
Hi All, I am pretty new to Hadoop. I noticed that there are many test classes in hadoop source code under the packages test. org.apache.hadoop. Can anyone explain me or provide some pointers to understand what is the purpose of this and how to use them. In particular I wanted to use

Re: Understand dfs.datanode.max.xcievers

2013-03-18 Thread Yanbo Liang
dfs.datanode.max.xcievers value should set across the cluster rather than particular DataNode. It means the upper bound on the number of files that the DataNode will serve at any one time. 2013/3/17 Dhanasekaran Anbalagan bugcy...@gmail.com Hi Guys, We are having few data nodes in an

On a small cluster can we double up namenode/master with tasktrackers?

2013-03-18 Thread David Parks
I want 20 servers, I got 7, so I want to make the most of the 7 I have. Each of the 7 servers have: 24GB of ram, 4TB, and 8 cores. Would it be terribly unwise of me to Run such a configuration: . Server #1: NameNode + Master + TaskTracker(reduced slots) . Server

Re: using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Yanbo Liang
These test classes are used for unit testing. You can run these cases to test particular function of a class. But when we run these test case, we need some additional classes and functions to simulate some underlying function which were called by these test cases. InMemoryNativeFileSystemStore is

RE: using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Agarwal, Nikhil
Hi, Thanks for the quick reply. In order to test the class TestInMemoryNativeS3FileSystemContract and its functions what should be the value of parameter sin my configuration files (core-site, mapred, etc.)? Regards, Nikhil From: Agarwal, Nikhil Sent: Monday, March 18, 2013 1:55 PM To:

namenode directory failure question

2013-03-18 Thread Brennon Church
Hello all, We have our dfs.name.dir configured to write to two local and one NFS directories. The NFS server in question had to be restarted a couple days back and that copy of the namenode data fell behind as a result. As I understand it, restarting hadoop will take the most recent copy of

Re: namenode directory failure question

2013-03-18 Thread Bertrand Dechoux
You may want to check this JIRA: https://issues.apache.org/jira/browse/HADOOP-4885 It won't help you right know but it could allow you next time to avoid restarting. Regards Bertrand On Mon, Mar 18, 2013 at 3:52 PM, Brennon Church bren...@getjar.com wrote: Hello all, We have our

hadoop file append

2013-03-18 Thread Tony Burton
Hi list, I'm using Hadoop 1.0.3 for a MapReduce task and I thought it might be a simple job to append a Counter value and some text to the end of a file (which ultimately will be in AWS S3). How wrong I was :) I've been reading about o.a.h.fs.FileSystem.append and whether it does or

CDH4 installation along with MRv1 from tarball

2013-03-18 Thread rohit sarewar
Need some guidance on CDH4 installation from tarballs I have downloaded two files from https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs *1) hadoop-0.20-mapreduce-0.20.2+1341 *(has only MRv1)* 2) hadoop-2.0.0+922 *(has HDFS+ Yarn) I was able to install MRv1 from first file

disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Tapas Sarangi
Hello, I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan. We have about 200 datanodes and some of

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Bertrand Dechoux
Hi, It is not explicitly said but did you use the balancer? http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer Regards Bertrand On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Hello, I am using one of the old legacy version (0.20) of hadoop for

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Tapas Sarangi
Hi, Sorry about that, had it written, but thought it was obvious. Yes, balancer is active and running on the namenode. -Tapas On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux decho...@gmail.com wrote: Hi, It is not explicitly said but did you use the balancer?

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Bertrand Dechoux
And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity. What threshold is used? About the small and big datanodes, how are they distributed with regards to racks? About files, how is

Re: hadoop file append

2013-03-18 Thread Harsh J
Appending on 1.x releases is available but not tested/supported and can be toggled to be disabled completely. Appending works better on 2.x releases. On Mon, Mar 18, 2013 at 9:14 PM, Tony Burton tbur...@sportingindex.com wrote: Hi list, I’m using Hadoop 1.0.3 for a MapReduce task and I

Re: Thanks!!

2013-03-18 Thread Harsh J
Just curious, why are we recommending one disables permissions rather than trying to make them understand it? On Mon, Mar 18, 2013 at 2:03 PM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: you need to disable following property.. property namedfs.permissions.enabled/name

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread 李洪忠
Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size, on rack by 6 small nodes, one rack by 1 large nodes. P.S. you need to reboot the cluster for rackware script modify. 于 2013/3/19 7:17, Bertrand Dechoux 写道: And by active, it means that

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Tapas Sarangi
On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux decho...@gmail.com wrote: And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity. This else is probably what's happening. I just

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Tapas Sarangi
Hi, On Mar 18, 2013, at 8:21 PM, 李洪忠 lhz...@hotmail.com wrote: Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size, on rack by 6 small nodes, one rack by 1 large nodes. P.S. you need to reboot the cluster for rackware script modify.

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-18 Thread Harsh J
What do you mean that the balancer is always active? It is to be used as a tool and it exits once it balances in a specific run (loops until it does, but always exits at end). The balancer does balance based on usage percentage so that is what you're probably looking for/missing. On Tue, Mar 19,

Re: using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Yanbo Liang
It just unit test, so you don't need to set any parameters in configuration files. 2013/3/18 Agarwal, Nikhil nikhil.agar...@netapp.com Hi, ** ** Thanks for the quick reply. In order to test the class TestInMemoryNativeS3FileSystemContract and its functions what should be the value

Re: Question regarding job execution

2013-03-18 Thread Harsh J
I am assuming you refer to the YARN's CapacityScheduler. The CS in YARN does support parallel job (the right term is application, or 'app', not 'job' anymore, when speaking in YARN's context) execution. If looking at code of CapacityScheduler.java and LeafQueue.java, you can notice it iterate