Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Edward Capriolo
So it turns out the issue was just the size of the filesystem. 2012-12-27 16:37:22,390 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4,354,340,042 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you need about 3x ram as your

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas
You did free up lot of old generation with reducing young generation, right? The extra 5G of RAM for the old generation should have helped. Based on my calculation, for the current number of objects you have, you need roughly: 12G of total heap with young generation size of 1G. This assumes the

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Edward Capriolo
I am not sure GC had a factor. Even when I forced a GC it cleared 0% memory. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that, but that sure does not seem to be the case. a 5GB image starts off using 10GB of memory and after

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas
I do not follow what you mean here. Even when I forced a GC it cleared 0% memory. Is this with new younggen setting? Because earlier, based on the calculation I posted, you need ~11G in old generation. With 6G as the default younggen size, you actually had just enough memory to fit the namespace

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Edward Capriolo
I tried your suggested setting and forced GC from Jconsole and once it crept up nothing was freeing up. So just food for thought: You said average file name size is 32 bytes. Well most of my data sits in /user/hive/warehouse/ Then I have a tables with partitions. Does it make sense to just

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas
I tried your suggested setting and forced GC from Jconsole and once it crept up nothing was freeing up. That is very surprising. If possible, take a live dump when namenode starts up (when memory used is low) and when namenode memory consumption has gone up considerably, closer to the heap

Retrieve node where a map task is running programmatically

2012-12-27 Thread Eduard Skaley
Hi, is there a way to find out in the setup function of a mapper on which node of the cluster the current mapper is running ? thank you very much, Eduard

Re: Retrieve node where a map task is running programmatically

2012-12-27 Thread Robert Evans
You don't need Hadoop to do this. Just use an InetAddress. http://docs.oracle.com/javase/6/docs/api/java/net/InetAddress.html --Bobby On 12/27/12 8:51 AM, Eduard Skaley e.v.ska...@gmail.com wrote: Hi, is there a way to find out in the setup function of a mapper on which node of the cluster

Re: Selecting a task for the tasktracker

2012-12-27 Thread Hemanth Yamijala
Hi, Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own

Re: setting hadoop for pseudo distributed mode.

2012-12-27 Thread Mohammad Tariq
what are those libraries and how are they reading data from HDFS? you were trying with MR jobs if i'm not wrong? in order to perform read/write on HDFS we need HDFS API with a Configuration object. how are you doing it here? Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Fri, Dec

UNSUBSCRIBE

2012-12-27 Thread Ray Bagby
-- Ray Bagby Weatherford, OK