Re: adding node(s) to Hadoop cluster

2014-12-11 Thread Rainer Toebbicke
Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli a écrit : > You don't need patterns for host-names, did you see the support for _HOST in > the principle names? You can specify the datanode principle to be say > datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on >

Re: Hadoop 2.4 + Hive 0.14 + Hbase 0.98.3 + snappy not working

2014-12-11 Thread peterm_second
Hi Hanish, Thanks for the link it did help. Long story short , always recompile native libraries for your machine :) Thanks, Peter On 11.12.2014 05:46, Hanish Bansal wrote: Hope this may help you: http://blogs.impetus.com/big_data/big_data_technologies/SnappyCompressionInHBase.do On Thu, De

run yarn container as specific user

2014-12-11 Thread Tim Williams
I'm able to use the UGI.doAs(..) to launch a yarn app and, through the ResourceManager, both the ApplicationMaster and Containers are associated with the correct user. But the process on the node itself really runs as the yarn user. The problem is that the yarn app writes data to DFS and its bein

Re: run yarn container as specific user

2014-12-11 Thread Hitesh Shah
Is you app code running within the container also being run within a UGI.doAs() ? You can use the following in your code to create a UGI for the “actual” user and run all the logic within that: actualUserUGI = UserGroupInformation.createRemoteUser(System .getenv(Applicati

DistributedCache

2014-12-11 Thread Srinivas Chamarthi
Hi, I want to cache map/reducer temporary output files so that I can compare two map results coming from two different nodes to verify the integrity check. I am simulating this use case with speculative execution by rescheduling the first task as soon as it is started and running. Now I want to

Re: adding node(s) to Hadoop cluster

2014-12-11 Thread Vinod Kumar Vavilapalli
I may be mistaken, but let me try again with an example to see if we are on the same page Principals - NameNode: nn/nn-h...@cluster.com - DataNode: dn/_h...@cluster.com Auth to local mappings - nn/nn-h...@cluster.com -> hdfs - dn/.*@cluster.com -> hdfs The combination of the above lets you

Re: DistributedCache

2014-12-11 Thread Shahab Yunus
Look at this thread. It has alternatives to DistributedCache. http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api Basically you can use the new method job.addCacheFiles to pass on stuff to the individual tasks. Regards, Shahab On Thu, Dec 1

Re: DistributedCache

2014-12-11 Thread unmesha sreeveni
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus wrote: > > job.addCacheFiles ​Yes you can use job.addCacheFiles to cache the file. Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path cachefile = new Path("path/to/file"); FileStatus[] list = fs.globStatus(cachefile

Where the output of mappers are saved ?

2014-12-11 Thread Abdul Navaz
Hello, I am interested in efficiently manage the Hadoop shuffling traffic and utilize the network bandwidth effectively. To do this I want to know how much shuffling traffic generated by each Datanodes ? Shuffling traffic is nothing but the output of mappers. So where this mapper output is saved