Re: Synchronization among Mappers in map-reduce task

2014-08-12 Thread Wangda Tan
Hi Saurabh, It's an interesting topic, >> So , here is the question , is it possible to make sure that when one of the mapper tasks is writing to a file , other should wait until the first one is finished. ? I read that all the mappers task don't interact with each other A simple way to do this i

Re: 100% CPU consumption by Resource Manager process

2014-08-12 Thread Wangda Tan
Hi Krishna, To get more understanding about the problem, could you please share following information: 1) Number of nodes and running app in the cluster 2) What's the version of your Hadoop? 3) Have you set "yarn.scheduler.capacity.schedule-asynchronously.enable"=true? 4) What's the "yarn.resourcem

Re: Negative value given by getVirtualCores() or getAvailableResources()

2014-08-12 Thread Wangda Tan
By default, vcore = 1 for each resource request. If you don't like this behavior, you can set yarn.scheduler.minimum-allocation-vcores=0 Hope this helps, Wangda Tan On Thu, Aug 7, 2014 at 7:13 PM, Krishna Kishore Bonagiri < write2kish...@gmail.com> wrote: > Hi, > I am calling getAvailableRes

How to use docker in Hadoop, with patch of YARN-1964?

2014-08-12 Thread sam liu
Hi Experts, I am very interesting that Hadoop could work with Docker and doing some trial on patch of YARN-1964. I applied patch yarn-1964-branch-2.2.0-docker.patch of jira YARN-1964 on branch 2.2 and am going to install a Hadoop cluster using the new generated tarball including the patch. Then,

org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=EXECUTE

2014-08-12 Thread Ana Gillan
Hi, I ran a job in Hive and it got to this stage: Stage-1 map = 100%, reduce = 29%, seemed to start cleaning up the containers and stuff successfully, and then I got this series of errors: 2014-08-12 03:58:55,718 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException a

Why 2 different approach for deleting localized resources and aggregated logs?

2014-08-12 Thread Rohith Sharma K S
Hi I see two different approach for deleting localized resources and aggregated logs. 1. Localized resources are deleted based on the size of localizer cache, per local directory. 2. Aggregated logs are deleted based on the time(if enabled). Is there any specific thoughts f

Pseudo -distributed mode

2014-08-12 Thread sindhu hosamane
Can Setting up 2 datanodes on same machine be considered as pseudo-distributed mode hadoop ? Thanks, Sindhu

Re: Pseudo -distributed mode

2014-08-12 Thread Sergey Murylev
Yes :) Pseudo-distributed mode is such configuration when we have some Hadoop environment on single computer. On 12/08/14 18:25, sindhu hosamane wrote: > Can Setting up 2 datanodes on same machine be considered as > pseudo-distributed mode hadoop ? > > Thanks, > Sindhu signature.asc Descrip

Re: Pseudo -distributed mode

2014-08-12 Thread sindhu hosamane
I have read "By default, Hadoop is configured to run in a non-distributed mode, as a single Java process" . But if my hadoop is pseudo distributed mode , why does it still runs as a single Java process and utilizes only 1 cpu core even if there are many more ? On Tue, Aug 12, 2014 at 4:32 PM, Se

Re: ulimit for Hive

2014-08-12 Thread Zhijie Shen
+ Hive user mailing list It should be a better place for your questions. On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan wrote: > Hi, > > I’ve been reading a lot of posts about needing to set a high ulimit for > file descriptors in Hadoop and I think it’s probably the cause of a lot of > the error

Making All datanode down

2014-08-12 Thread Satyam Singh
Hi Users, In my cluster setup i am doing one test case of making only all datanodes down and keep namenode running. In this case my application gets error with remoteException: could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(

Re: ulimit for Hive

2014-08-12 Thread Chris MacKenzie
Hi Zhijie, ulimit is common between hard and soft ulimit The hard limit can only be set by a sys admin. It can be used for a fork bomb dos attack. The sys admin hard ulimit can be set per user i.e hadoop_user A user can add a line to their .profile file setting a soft -ulimit up to the hard limi

hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-12 Thread Calvin
Hi all, I've instantiated a Hadoop 2.4.1 cluster and I've found that running MapReduce applications will parallelize differently depending on what kind of filesystem the input data is on. Using HDFS, a MapReduce job will spawn enough containers to maximize use of all available memory. For example

MR AppMaster unable to load native libs

2014-08-12 Thread Subroto Sanyal
Hi, I am running a single node hadoop cluster 2.4.1. When I submit a MR job it logs a warning: 2014-08-12 21:38:22,173 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The problem doesn’t c

Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread mani kandan
Which distribution are you people using? Cloudera vs Hortonworks vs Biginsights?

I was wondering what could make these 2 variables different: HADOOP_CONF_DIR vs YARN_CONF_DIR

2014-08-12 Thread REYANE OUKPEDJO
Can someone explain what makes the above variable different  ? Most of the time they are set pointing to the same directory. Thanks  Reyane OUKPEDJO

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
Hortonworks. Here is my reasoning: 1. Hortonwork is 100% open source. 2. MapR has stuff on their roadmap that Hortonworks has already accomplished and has moved on to other things. 3. Cloudera has proprietary stuff in their stack. No. 4. Hortonworks makes training super accessible and there is a c

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Kai Voigt
3. seems a biased and incomplete statement. Cloudera’s distribution CDH is fully open source. The proprietary „stuff" you refer to is most likely Cloudera Manager, an additional tool to make deployment, configuration and monitoring easy. Nobody is required to use it to run a Hadoop cluster. Ka

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
You fell into my trap sir. I was hoping someone would clear that up. :) Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Kai Voigt Sent: Tuesday, August 12, 2014 4:10 PM To: user@hadoop.apache.org Subject:

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Aaron Eng
On that note, 2 is also misleading/incomplete. You might want to explain which specific features you are referencing so the original poster can figure out if those features are relevant. The inverse of 2 is also true, things like consistent snapshots and full random read/write over NFS are in Map

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Jay Vyas
also, consider apache bigtop. That is the apache upstream Hadoop initiative, and it comes with smoke tests+ Puppet recipes for setting up your own Hadoop distro from scratch. IMHO ... If learning or building your own tooling around Hadoop , bigtop is ideal. If interested in purchasing support

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
Is this up to date? http://www.mapr.com/products/product-overview/overview Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Aaron Eng Sent: Tuesday, August 12, 2014 4:31 PM To: user@hadoop.apache.org Subj

Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager

2014-08-12 Thread Xuan Gong
Hey, Arthur: Could you show me the error message for rm2. please ? Thanks Xuan Gong On Mon, Aug 11, 2014 at 10:17 PM, arthur.hk.c...@gmail.com < arthur.hk.c...@gmail.com> wrote: > Hi, > > Thank y very much! > > At the moment if I run ./sbin/start-yarn.sh in rm1, the standby STANDBY > Res

Hadoop 2.4 failed to launch job on aws s3n

2014-08-12 Thread Yue Cheng
Hi, I deployed Hadoop 2.4 on AWS EC2 using S3 native file system as a replacement of HDFS. I tried several example apps, all gave me the following stack tracing msgs (an older thread on Jul 24 hang there w/o being resolved... So I attach the DEBUG info here...): hadoop jar share/hadoop/mapreduce/

fair scheduler not working as intended

2014-08-12 Thread Henry Hung
Hi Everyone, I'm using Hadoop-2.2.0 with fair scheduler in my YARN cluster, but something is wrong with the fair scheduler. Here is my fair-scheduler.xml looks like: 15360 mb, 5 vcores 0.5 2 5 1 I create a "longrun" queue to ensure that huge MR application can only

Re: Synchronization among Mappers in map-reduce task

2014-08-12 Thread saurabh jain
Hi Wangda , I am not sure making overwrite=false , will solve the problem. As per java doc by making overwrite=false , it will throw an exception if the file already exists. So, for all the remaining mappers it will throw an exception. Also I am very new to ZK and have very basic knowledge of it

Re: Making All datanode down

2014-08-12 Thread Gordon Wang
Did you try to close the file and reopen it for writing after datanodes restart ? I think if you close the file and reopen it. The exception might disappear. On Wed, Aug 13, 2014 at 2:21 AM, Satyam Singh wrote: > Hi Users, > > > > In my cluster setup i am doing one test case of making only all

Re: Synchronization among Mappers in map-reduce task

2014-08-12 Thread Wangda Tan
Hi Saurabh, >> am not sure making overwrite=false , will solve the problem. As per java doc by making overwrite=false , it will throw an exception if the file already exists. So, for all the remaining mappers it will throw an exception. You can catch the exception and wait. >> Can you please refe

Re: How to use docker in Hadoop, with patch of YARN-1964?

2014-08-12 Thread sam liu
After applying this patch, I added following config in yarn-site.xml: yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor Then I can start NodeManager with enabling DockerContainerExecutor. But failed to execute a simple mr job, an

Re: fair scheduler not working as intended

2014-08-12 Thread Yehia Elshater
Hi Henry, Are there any applications (on different queues rather than longrun queue) are running in the same time ? I think FairScheduler is going to assign more resources to your "longrun" as long as there no other applications are running in the other queues. Thanks Yehia On 12 August 2014 20

Re: MR AppMaster unable to load native libs

2014-08-12 Thread Susheel Kumar Gadalay
This message I have also got when running in 2.4.1 I have found the native libraries in $HADOOP_HOME/lib/native are 32 bit not 64 bit. Recompile once again and build 64 bit shared objects, but it is a lengthy exercise. On 8/13/14, Subroto Sanyal wrote: > Hi, > > I am running a single node hadoo

RE: fair scheduler not working as intended

2014-08-12 Thread Henry Hung
Hi Yehia, Oh? I thought that by using maxResources = 15360 mb (3072 mb * 5), vcores = 5, and maxMaps = 5, I already restricting the job to only use 5 maps at max. The reason is my long run job have 841 maps, and each map will process data for almost 2 hours. In the meantime there will be some s

Implementing security in hadoop

2014-08-12 Thread Chhaya Vishwakarma
Hi, I'm trying to implement security on my hadoop data. I'm using Cloudera hadoop Below are the two specific things I'm looking for 1. Role based authorization and authentication 2. Encryption on data residing in HDFS I have looked into Kerboroes but it doesn't provide encryption for data alre