Le 10 déc. 2014 à 20:08, Vinod Kumar Vavilapalli a
écrit :
> You don't need patterns for host-names, did you see the support for _HOST in
> the principle names? You can specify the datanode principle to be say
> datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on
>
Hi Hanish,
Thanks for the link it did help. Long story short , always recompile
native libraries for your machine :)
Thanks,
Peter
On 11.12.2014 05:46, Hanish Bansal wrote:
Hope this may help you:
http://blogs.impetus.com/big_data/big_data_technologies/SnappyCompressionInHBase.do
On Thu, De
I'm able to use the UGI.doAs(..) to launch a yarn app and, through the
ResourceManager, both the ApplicationMaster and Containers are
associated with the correct user. But the process on the node itself
really runs as the yarn user. The problem is that the yarn app writes
data to DFS and its bein
Is you app code running within the container also being run within a UGI.doAs()
?
You can use the following in your code to create a UGI for the “actual” user
and run all the logic within that:
actualUserUGI = UserGroupInformation.createRemoteUser(System
.getenv(Applicati
Hi,
I want to cache map/reducer temporary output files so that I can compare
two map results coming from two different nodes to verify the integrity
check.
I am simulating this use case with speculative execution by rescheduling
the first task as soon as it is started and running.
Now I want to
I may be mistaken, but let me try again with an example to see if we are on the
same page
Principals
- NameNode: nn/nn-h...@cluster.com
- DataNode: dn/_h...@cluster.com
Auth to local mappings
- nn/nn-h...@cluster.com -> hdfs
- dn/.*@cluster.com -> hdfs
The combination of the above lets you
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec 1
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus
wrote:
>
> job.addCacheFiles
Yes you can use job.addCacheFiles to cache the file.
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile
Hello,
I am interested in efficiently manage the Hadoop shuffling traffic and
utilize the network bandwidth effectively. To do this I want to know how
much shuffling traffic generated by each Datanodes ? Shuffling traffic is
nothing but the output of mappers. So where this mapper output is saved