Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Jonathan Aquilina
Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket

Re: hadoop learning

2015-02-21 Thread Fabio C.
Hi Rishabh, I didn't know anything about Hadoop a few months ago, and I started from the very beginning. I don't suggest you to start with online documentation, that is always fragmented, incomplete and sometimes not even up to date. Also starting by directly using Hadoop is the fastest way to

Re: Scheduling in YARN according to available resources

2015-02-21 Thread R Nair
Hi Tariq, Glad to see that your issue is resolved, thank you. This re-affirms the compatibility issue with openJDK. Thanks Regards, Ravi On Sat, Feb 21, 2015 at 1:40 PM, tesm...@gmail.com tesm...@gmail.com wrote: Dear Nair, Your tip in your first email saved my day. Tahnks once again. I am

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Fang Zhou
Thank you for your sharing. Appreciate. Tim On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: Hi Tim, Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Jonathan Aquilina
Hi Tim, Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others. https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf --- Regards, Jonathan Aquilina Founder Eagle Eye T On

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Fang Zhou
Can anyone help me? Thanks, Tim On Feb 21, 2015, at 2:54 PM, Fang Zhou timchou@gmail.com wrote: Hi All, I want to test the memory usage on Namenode and Datanode. I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory. The values I

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Tim Chou
Hi Jonathan, Very useful information. I will look at the ganglia. However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster. Thank you for your information. Best, Tim 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina

Re: Hadoop - HTTPS communication between nodes - How to Confirm ?

2015-02-21 Thread Ulul
Hi Be careful, HTTPS is to secure WebHDFS. If you want to protect all network streams you need more than that : https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_reference/content/reference_chap-wire-encryption.html If you're just interested in HTTPS an lsof -p

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Jonathan Aquilina
I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-02-21 21:54, Fang Zhou wrote: Hi All, I want to test the memory usage on Namenode and Datanode. I try to use

Re: How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Fang Zhou
Hi Jonathan, Thank you. The number of files impact on the memory usage in Namenode. I just want to get the real memory usage situation in Namenode. The memory used in heap always changes so that I have no idea about which value is the right one. Thanks, Tim On Feb 22, 2015, at 12:22 AM,

hadoop learning

2015-02-21 Thread Rishabh Agrawal
Hello, Please tell me where can i learn the concepts of Big Data and Hadoop from the scratch. Please provide some links online. Rishabh Agrawal

Re: hadoop learning

2015-02-21 Thread Bhupendra Gupta
I have been learning and trying to implement a hadoop ecosystem for one of the POC from last 1 month or so and i think that the best way to learn is by doing it.. Hadoop as the concept has lots of implementation and i picked up hortonworks sandbox for learning... This has helped me in guaging

Re: Time taken by -copyFromLocalHost for transferring data

2015-02-21 Thread Ranadip Chatterjee
$ time hadoop fs -put local file hdfs path For small files, I would expect the time to have a significant variance between runs. For larger files, it should be more consistent (since the throughput will be bound by the network bandwidth of the local machine). On 21 Feb 2015 08:43,

Running MapReduce jobs in batch mode on different data sets

2015-02-21 Thread tesm...@gmail.com
Hi, Is it possible to run jobs on Hadoop in batch mode? I have 5 different datasets in HDFS and need to run the same MapReduce application on these datasets sets one after the other. Right now I am doing it manually How can I automate this? How can I save the log of each execution in text

Re: hadoop learning

2015-02-21 Thread Ted Yu
Rishabh: You can start with: http://wiki.apache.org/hadoop/HowToContribute There're several components: common, hdfs, YARN, mapreduce, ... Which ones are you interested in ? Cheers On Sat, Feb 21, 2015 at 12:18 AM, Bhupendra Gupta bhupendra1...@gmail.com wrote: I have been learning and trying

How can I get the memory usage in Namenode and Datanode?

2015-02-21 Thread Fang Zhou
Hi All, I want to test the memory usage on Namenode and Datanode. I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory. The values I get from them are different. I also found that the memory always changes periodically. This is the first thing

RE: Yarn AM is abending job more information

2015-02-21 Thread Roland DePratti
Alex, Thanks for looking at the output and your feedback. I want to make sure I understand your input correctly. My cluster is a set of old dual core machines and my client is a virtual box VM with 10 GB mem allocated to it. I did some more testing (and will continue to do so to track