Re: The Case of a Long Running Hadoop System

2008-11-17 Thread Konstantin Shvachko
Bagri, According to the numbers you posted your cluster has 6,000,000 block replicas and only 12 data-nodes. The blocks are small on average about 78KB according to fsck. So each node contains about 40GB worth of block data. But the number of blocks is really huge 500,000 per node. Is my math cor

Re: The Case of a Long Running Hadoop System

2008-11-16 Thread Abhijit Bagri
We do not have a secondary namenode because 0.15.3 has serious bug which truncates the namenode image if there is a failure while namenode fetches image from secondary namenode. See HADOOP-3069 I have a patched version of 0.15.3 for this issue. From the patch of HADOOP-3069, the changes are

Re: The Case of a Long Running Hadoop System

2008-11-15 Thread Billy Pearson
If I understand the secondary namenode merges the edits log in to the fsimage and reduces the edit log size. Which is likely the root of your problems 8.5G seams large and likely putting a strain on your master servers memory and io bandwidth Why do you not have a secondary namenode? If you do

The Case of a Long Running Hadoop System

2008-11-15 Thread Abhijit Bagri
Hi, This is a long mail as I have tried to put in as much details as might help any of the Hadoop dev/users to help us out. The gist is this: We have a long running Hadoop system (masters not restarted for about 3 months). We have recently started seeing the DFS responding very slowly whi