Re: Best Practices for Upgrading Hadoop Version?

2012-05-30 Thread Chris Smith
Michael Noll has a good description of the upgrade process here: http://www.michael-noll.com/blog/2011/08/23/performing-an-hdfs-upgrade-of-an-hadoop-cluster/ If may not quite reflect the versions of Hadoop you plan to upgrade but it has some good pointers. Chris On 30 May 2012 09:12, wrote: >

Re: Moving blocks from a datanode

2012-05-22 Thread Chris Smith
M, See http://wiki.apache.org/hadoop/FAQ - "3.6. I want to make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done?" This explains how to decomission nodes by moving the data off of the existing node. It's fairly easy and painless (just add the nodename t

Re: collecting CPU, mem, iops of hadoop jobs

2012-01-03 Thread Chris Smith
Have a look at OpenTSDB (http://opentsdb.net/overview.html) as this does not have the same down sampling issue as Ganglia and stores the metrics in HBase making it easier to access and process the data. It's also pretty easy to add your own metrics. Another useful utility is 'collectl' (http://col

Re: Distributed sorting using Hadoop

2011-11-29 Thread Chris Smith
Madhu, Try working your way through the MapReduce tutorial here: http://hadoop.apache.org/common/docs/r0.20.205.0/mapred_tutorial.html#Example%3A+WordCount+v1.0 that covers most of the concepts you require to do a distributed sort. Search for the worf, "combiner", in the tutorial to understand a

Re: Running more than one secondary namenode

2011-10-12 Thread Chris Smith
Jorn, If you've configured the Name Node fsimage and edit log replication to both NFS and Secondary Name Node and regularly backup the fsimage and edit logs you would do better investing time in understanding exactly how the Name Node builds up it's internal database and how it applies it's edit

Re: Block Size

2011-09-29 Thread Chris Smith
On 29 September 2011 18:39, lessonz wrote: > I'm new to Hadoop, and I'm trying to understand the implications of a 64M > block size in the HDFS. Is there a good reference that enumerates the > implications of this decision and its effects on files stored in the system > as well as map-reduce jobs?

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Chris Smith
Elton, Rapleaf's blog has an interesting posting on their experience that's worth a read: http://blog.rapleaf.com/dev/2010/08/26/analyzing-some-interesting-networks-for-mapreduce-clusters/ And if you want to get an idea of the interaction between CPU, Disk and Network there nothing like a pictu

Re: tips and tools to optimize cluster

2011-05-24 Thread Chris Smith
Worth a look at OpenTSDB ( http://opentsdb.net/ ) as it doesn't lose precision on the historical data. It also has some neat tracks around the collection and display of data. Another useful tool is 'collectl' ( http://collectl.sourceforge.net/ ) which is a light weight Perl script that both captur

Re: the question of hadoop

2010-09-08 Thread Chris Smith
2010/9/6 褚 鵬兵 : > > hi ,my hadoop friends:i have the 3 questions about hadoop.there are > > 1 the speed between the datanodes.   Tera data in one datanodes ,   the data   > transfers from one datanode to the another datanode.   if the speed  is bad, > Hadoop will be slow, i think.   i heard t

RE: Question about disk space allocation in hadoop

2010-06-30 Thread Chris Smith
Some thoughts on how to restrict the temporary data, but I have only tried (a) in anger: a)    Partition your disks into HDFS and intermediate temp partitions of the relevant size.  This gives a fixed separation but is difficult/impossible to modify on a busy cluster especially as there may be no