About getting proper track of learning hadoop

2016-05-27 Thread Manohar Vare
now I will looking for hadoop learning classes, so give me suggestions to where join hadoop classes and how to do study of hadoop framework Sent from my Windows Phone

Map Task Execution Time

2016-05-27 Thread Sultan Alamro
Hi there, By looking the .jhist file of a job, I see that there are startTime and finishTime for each map task. My question is, does reading the input data (local or remote) included in the execution time? Thanks, Sultan

RE: YARN cluster underutilization

2016-05-27 Thread Guttadauro, Jeff
Hi, all. Just wanted to provide an update, which is that I’m finally getting good YARN cluster utilization (consistently within the 90-100% range!). I believe the biggest change was to increase the min split size. Since our input is all in S3 and data locality is not really an issue, I bumpe

Re: Reliability of Hadoop

2016-05-27 Thread Dejan Menges
Hi Deepak, Hadoop is just platform (Hadoop and all around it). Toolset to do what you want to do. If you are writing bad code you can't blame programming language. It's you not being able to write good code. There's also nothing bad in using commodity hardware (and not sure I understand whats' co

Re: Reliability of Hadoop

2016-05-27 Thread Deepak Goel
Sorry once again if I am wrong, or my comments are without significance I am not saying Hadoop is bad or good...It is just that Hadoop might be indirectly encouraging commodity hardware and software to be developed which is convenient but might not be very good (also the cost factor is unproven wi

Re: Reliability of Hadoop

2016-05-27 Thread J. Rottinghuis
We run several clusters of thousands of nodes (as do many companies), our largest one has over 10K nodes. Disks, machines, memory, and network fail all the time. The larger the scale, the higher the odds that some machine is bad in a given day. On the other hand, scale helps. If a single node our o

Re: Performance Benchmarks on "Number of Machines"

2016-05-27 Thread Deepak Goel
My thoughts inline You could see it the other way around, it is enabling everyone to solve problems that are too complex for one server. * Deepak * If one server (with scale up) would provide the scalability as several hundreds of them (scale out) then the effect computing

RE: Performance Benchmarks on "Number of Machines"

2016-05-27 Thread Matieu Bachant-Lagace
You could see it the other way around, it is enabling everyone to solve problems that are too complex for one server. Another way to look at it is that it reduces costs because scaling out is much cheaper than scaling up. You can actually (and usually you have to) be pretty ingenious and you ha

Re: Performance Benchmarks on "Number of Machines"

2016-05-27 Thread Deepak Goel
What I think, and i am sorry if i am wrong :-( In the cluster you are not only adding hardware (cpu, memory, disk) but you are having separate software (os, jvm, application)...So the reason the cluster is scaling linear is not due to hardware, but due to seperate software on each machine [As comp

Re: Performance Benchmarks on "Number of Machines"

2016-05-27 Thread Arun Natva
Deepak, I believe yahoo and Facebook have largest clusters like over 4-5 thousand nodes of size.. If you add a new server to the cluster, you are simply adding to the cpu, memory, disk space of the cluster.. So, the capacity grows linearly as you add nodes except that network bandwidth is share

Re: Reliability of Hadoop

2016-05-27 Thread Arun Natva
Deepak, I have managed clusters where worker nodes crashed, disks failed.. HDFS takes care of the data replication unless you loose too many of the nodes where there is not enough space to fit the replicas. Sent from my iPhone > On May 27, 2016, at 11:54 AM, Deepak Goel wrote: > > > Hey >

Re: Hadoop Security - NN fails to connect to JN on different servers

2016-05-27 Thread Gagan Brahmi
One of the potential problem might be with the hostname configurations. If you are using the host file to resolve the DNS please verify the hostnames set in the host file. The problem looks to be related with the invalid principal name which can be due to bad hostname to IP mapping. Regards, Gaga

Reliability of Hadoop

2016-05-27 Thread Deepak Goel
Hey Namaskara~Nalama~Guten Tag~Bonjour We are yet to see any server go down in our cluster nodes in the production environment? Has anyone seen reliability problems in their production environment? How many times? Thanks Deepak -- Keigu Deepak 73500 12833 www.simtree.net, dee...@simtree.net

Performance Benchmarks on "Number of Machines"

2016-05-27 Thread Deepak Goel
Hey Namaskara~Nalama~Guten Tag~Bonjour Are there any performance benchmarks as to how many machines can Hadoop scale up to? Is the growth linear (For 1 machine - growth x, for 2 machines - 2x growth, for 1 machines - 1x growth??) Also does the scaling depend on the type of jobs and amoun

Re: Mask value not shown in GETFACL using webhdfs

2016-05-27 Thread kumar r
Thank you for the detailed explanation. On Tue, May 24, 2016 at 10:48 PM, Chris Nauroth wrote: > Hello Kumar, > > I answered at the Stack Overflow link. I'll repeat the same information > here for everyone's benefit. > > HDFS implements the POSIX ACL model [1]. The linked documentation > expla