Re: Decommissioning a datanode takes forever

2013-01-22 Thread Ben Kim
UPDATE: WARN with edit log had nothing to do with the current problem. However replica placement warnings seem to be suspicious. Please have a look at the following logs. 2013-01-22 09:12:10,885 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still

Re: Decommissioning a datanode takes forever

2013-01-22 Thread Ben Kim
Impatient I am, I just shut down the cluster and restarted it with empty exclude file. If I added the datanode hostname back to the exclude file, and ran hadoop dfsadmin -refreshNodes, *the datanode goes straight to the dead node *without going to the descommission process. I'm done for today.

CDH412/Hadoop 2.0.3 Upgrade instructions

2013-01-22 Thread Dheeren bebortha
I am trying to upgrade a Hadoop Cluster with 0.20.X and MRv1 to a hadoop Cluster with CDH412 with HA+QJM+YARN (aka Hadoop 2.0.3) without any data loss and minimal down time. The documentation on cloudera site iis OK, but very confusing. BTW I do not plan on using Cloudera manager. Has anyone

RE: Using certificates to secure Hadoop

2013-01-22 Thread Fabio Pitzolu
Hi Nitin, thank you for the answer. Your second option will be the most feasible, and I think that this not hadoop-aware, but itÂ’s a general Tomcat configuration, am I right? Could you please link me some doc about this configuration? Thanks a lot! Fabio Pitzolu From: Nitin

Re: CDH412/Hadoop 2.0.3 Upgrade instructions

2013-01-22 Thread Harsh J
Moving to cdh-u...@cloudera.org as your question is CDH related. My answers inline: On Tue, Jan 22, 2013 at 4:35 AM, Dheeren bebortha dbebor...@salesforce.comwrote: I am trying to upgrade a Hadoop Cluster with 0.20.X and MRv1 to a hadoop Cluster with CDH412 with HA+QJM+YARN (aka Hadoop 2.0.3)

Re: Loading file to HDFS with custom chunk structure

2013-01-22 Thread Mohammad Tariq
First of all, the software will get just the block residing on that DN and not the entire file. What is your primary intention?To process the SEGY data using MR or through the tool you are talking about?I had tried something similar through SU, but it didn't quite work for me and because of the

Re: Hadoop Cluster

2013-01-22 Thread bejoy . hadoop
Hi Savitha HA is a new feature n hadoop introduced in Hadoop 2.x releases. So It is a new feature on top of Hadoop cluster. Ganglia is one of the widely used tools to monitor the cluster in detail. On a basic hdfs and mapreduce level, the JobTracker and NameNode web UI would give you a good

Re: Hadoop Cluster

2013-01-22 Thread Mohammad Tariq
The most significant difference between the two, as per my view, is that HA eliminates the problem of 'NN as the single point of failure'. For a detailed info I would suggest you to visit the official web site. You might also find this linkhttp://blog.cloudera.com/blog/2011/02/hadoop-availability/

Where do/should .jar files live?

2013-01-22 Thread Chris Embree
Hi List, This should be a simple question, I think. Disclosure, I am not a java developer. ;) We're getting ready to build our Dev and Prod clusters. I'm pretty comfortable with HDFS and how it sits atop several local file systems on multiple servers. I'm fairly comfortable with the concept of

Re: NameNode low on available disk space

2013-01-22 Thread Andy Isaacson
Moving from general@ to user@. The general list is not for technical questions, it's to discuss project-wide issues. On Tue, Jan 22, 2013 at 1:03 PM, Mohit Vadhera project.linux.p...@gmail.com wrote: Namenode switches into safemode when it has low disk space on the root fs / i have to manually

Hadoop Yarn assembly error

2013-01-22 Thread blah blah
Hi All I have 3 quick questions to ask. 1. I am following this single node tutorial http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html. Unfortunately when I issue maven command mvn clean install assembly:assembly -Pnative I get the following error. [INFO]

Re: Spring for hadoop

2013-01-22 Thread Radim Kolar
i have solution integrating spring beans and spring batch directly into hadoop core. its far more advanced then spring data hadoop support with pojo patch. in my solution every component of mapreduce can be hadoop bean. You will get spring batch integrated directly into mapper, which means

Re: NameNode low on available disk space

2013-01-22 Thread Mohit Vadhera
Ok Steve. I am forwarding my issue again to the list that you said. The version is Hi, Namenode switches into safemode when it has low disk space on the root fs / i have to manually run a command to leave it. Below are log messages for low space on root / fs. Is there any parameter so that i can

Re: Where do/should .jar files live?

2013-01-22 Thread Hemanth Yamijala
On top of what Bejoy said, just wanted to add that when you submit a job to Hadoop using the hadoop jar command, the jars which you reference in the command on the edge/client node will be picked up by Hadoop and made available to the cluster nodes where the mappers and reducers run. Thanks

Re: NameNode low on available disk space

2013-01-22 Thread Harsh J
Edit your hdfs-site.xml (or whatever place of config your NN uses) to lower the value of property dfs.namenode.resource.du.reserved. Create a new property if one does not exist, and set the value of space to a suitable level. The default itself is pretty low - 100 MB in bytes. On Wed, Jan 23,

Re: What is Heap Space in Hadoop Heap Size is 222.44 MB / 888.94 MB (25%)

2013-01-22 Thread Vikas Jadhav
Thanx for reply On Wed, Jan 23, 2013 at 9:16 AM, Robert Molina rmol...@hortonworks.comwrote: Hi Vikas, This is showing the total amount of memory currently used in the Java virtual machine for the namenode process versus the maximum amount of memory that the Java virtual machine will

Re: NameNode low on available disk space

2013-01-22 Thread Harsh J
Hi again, Yes, you need to add it to hdfs-site.xml and restart the NN. Thanks Harsh, Do I need to add parameters in hdfs-site.xml and restart service namenode. + public static final String DFS_NAMENODE_DU_RESERVED_KEY = dfs.namenode.resource.du. reserved; + public static final long

Re: NameNode low on available disk space

2013-01-22 Thread Mohit Vadhera
Thanks Guys, As you said the level is already pretty low i.e 100 MB but in my case the root fs / has 14 G available. What can be the root cause then ? /dev/mapper/vg_operamast1-lv_root 50G 33G 14G 71% / As per logs. 2013-01-21 01:22:52,217 WARN

io.sort.factor

2013-01-22 Thread Ajay Srivastava
Hi, io.sort.factor -- The number of streams to merge at once while sorting files. This determines the number of open file handles. How can I use this parameter to improve performance of mapreduce job? My understanding from above description was If there are many spill records then

Re: io.sort.factor

2013-01-22 Thread bharath vissapragada
Hi, From my understanding, increasing io.sort.mb decreases the number of disk flushes as more data is spilled at once and this boosts the performance. This obviously decreases the number of spills. One possibility in this case is that, the number of spills have become less than io.sort.factor

Re: io.sort.factor

2013-01-22 Thread Ajay Srivastava
Hi Bharat, I am looking at these logs - 2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 The number at the end of string does not go beyond 6. So I assume you are correct. Regards, Ajay Srivastava On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote: Hi,